Literature DB >> 18194514

Computational analysis of splicing errors and mutations in human transcripts.

Yerbol Z Kurmangaliyev1, Mikhail S Gelfand.   

Abstract

BACKGROUND: Most retained introns found in human cDNAs generated by high-throughput sequencing projects seem to result from underspliced transcripts, and thus they capture intermediate steps of pre-mRNA splicing. On the other hand, mutations in splice sites cause exon skipping of the respective exon or activation of pre-existing cryptic sites. Both types of events reflect properties of the splicing mechanism.
RESULTS: The retained introns were significantly shorter than constitutive ones, and skipped exons are shorter than exons with cryptic sites. Both donor and acceptor splice sites of retained introns were weaker than splice sites of constitutive introns. The authentic acceptor sites affected by mutations were significantly weaker in exons with activated cryptic sites than in skipped exons. The distance from a mutated splice site to the nearest equivalent site is significantly shorter in cases of activated cryptic sites compared to exon skipping events. The prevalence of retained introns within genes monotonically increased in the 5'-to-3' direction (more retained introns close to the 3'-end), consistent with the model of co-transcriptional splicing. The density of exonic splicing enhancers was higher, and the density of exonic splicing silencers lower in retained introns compared to constitutive ones and in exons with cryptic sites compared to skipped exons.
CONCLUSION: Thus the analysis of retained introns in human cDNA, exons skipped due to mutations in splice sites and exons with cryptic sites produced results consistent with the intron definition mechanism of splicing of short introns, co-transcriptional splicing, dependence of splicing efficiency on the splice site strength and the density of candidate exonic splicing enhancers and silencers. These results are consistent with other, recently published analyses.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18194514      PMCID: PMC2234086          DOI: 10.1186/1471-2164-9-13

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Vertebrate genes consist of relatively short exons separated by considerably larger introns. The introns of lower eukaryotes, invertebrates and plants are much shorter. This difference may be explained by the preference for two possible mechanisms for recognition of the exon-intron boundaries by the splicing machinery. In the case of long introns, the exon definition mechanism initially recognizes pairs of splicing sites corresponding to one exon. Vice versa, short introns are recognized by the intron definition that pairs splicing sites across introns [1]. Historically, the intron definition mechanism seems to be the ancestral one, whereas exon definition likely is a relatively recent innovation that, in particular, created the possibility of regulated alternative splicing [2]. These models yield different consequences of mutations that destroy splicing sites. Errors in exon definition should lead to exon skipping or, if there are strong cryptic sites, the use of the latter, whereas errors in intron definition should cause intron retention. Indeed, exactly this behavior was observed in vivo and in vitro experiments (reviewed by [1]), and in early analyses of disease-causing mutations of human genes [3,4]. These predictions also agree to the distribution of alternative splicing types in different organisms. In vertebrates, where long introns are frequent, the prevalent type of alternative splicing is exon skipping [5,6], while in plants, where the majority of introns are short, the most frequent type is intron retention [5,7]. Intron retention is the least studied type of alternative and aberrant splicing. In contrast with other types of alternative splicing, which involve the choice between different splice sites, intron retention represents complete absence of splicing. Some specific features of retained introns have become clear in recent studies of human [8,9] and plant transcriptomes [10]. Retained introns were found to differ from other introns in GC content, that was lower than in exons but higher than in constitutively spliced out introns. Retained introns were shown to be shorter on the average than constitutively spliced out ones and exhibited a tendency to occur in 5'- and 3'-untranslated regions [8-10]; they also have weaker sites [9]. In several cases intron retention clearly has a function. A considerable fraction of retained introns encode identifiable protein domains or parts thereof [8,11]. In some cases intron retention produces different functional isoforms (EBNA-3 family anigens of the Epstein-Barr virus [12]); isoforms with aberrant function (cancerspecific form of cholecystokinin 2 receptor [13]); truncated proteins that may be involved in regulation (cold-dependent lipid metabolism in plants [14], nuclear transport of retroviruses [15], autoregulation of splicing [16]); non-functional proteins (P-element of Drosophila [17] or rat cytochrome P450 CYP2C11 in stressed liver [18]); proteins with unknown function (serine protease kallikrein [19,20]); or, finally, isoforms with no known functional differences between the variants (hormone urocortin 1 prepropeptide [21], cyclooxygenase [22], D1 dopamine receptor (DR1) interacting protein calcyon [23], mouse homeodomain transcription factor Tgif2 [24]). At that, intron retention may be conserved in vertebrates, e.g. intron 3 of splicing regulator of the SR family 9G8 [16] or species-specific, e.g. intron 2 of Tgif2, present in the mouse gene, but not its human ortholog Tgif2 [24]. However, it is likely that many cases of observed intron retention were caused by errors of the splicing machinery. Retained introns are the least conserved type of elementary alternatives [25]. Moreover, large scale projects that aim at sequencing of full-length cDNA use normalization procedures to enrich low copy transcripts, and these procedures seem to increase the fraction of underspliced transcripts that retain one or several introns [26,27]. Traditionally such artifacts in cDNA databases were treated as a nuisance and filtered out in attempts to create "clean" sets of alternative isoforms. We tried to look at introns retained in human cDNA data from another angle, assuming that they capture intermediate states of the splicing process and thus provide a glimpse on the splicing mechanisms. Another way to look at this mechanism is to analyze consequences of mutations in splice sites. This also has been the subject of several very recent studies. Such mutations have two major possible outcomes: exon skipping and activation of cryptic sites, whereas intron retention is relatively rare [3,28-30]. One of important determinants of the cryptic donor splice site phenotype is the presence of a strong candidate donor splice site in the vicinity of mutated sites [3,31]. Cryptic acceptor splice sites are more frequent in exons than in introns, likely due to depletion of AG dinucleotides upstream of the original acceptor sites [32]. There are differences in the distribution of candidate exonic enhancers and silencers between skipped exons and exons with activated cryptic sites [33]. Here we systematically studied aberrant and mutated splicing. Specifically, we compared lengths of affected and adjacent introns and exons, as well strengths of splice sites and distribution of predicted splicing enhancers and silencers in these and adjacent exons and introns. While confirming many earlier predictions, our study also provides a number of new observations that are largely consistent with existing models of the splicing mechanisms.

Results

Comparison of retained and constitutive introns

Sets of retained (Fig. 1) and constitutive (constitutively spliced out) introns were constructed as described in Data and Methods and compared with the aim to identify possible determinants of intron retention. We considered the distribution of intron lengths and of lengths of the flanking exons, scores of intron splice sites and the distal sites in the flanking exons (the acceptor site of the upstream exon and the donor site of the downstream exon), densities of exonic cis-acting elements, intron positions within the gene. The results are summarized in Table 1.
Figure 1

Definition of scored intron retention events. Gray rectangles represent exons of the RefSeq gene and mRNA. Exon/intron boundaries are marked by dotted lines.

Table 1

Properties of retained and constitutive introns. For all intron parameters the medians are reported. The last two columns report the statistical significance of the differences of the distributions by the Kolmogorov-Smirnov test (KS) and Student's t-test (ST); n/s – non significant.

introns

RetainedConstitutively splicedKSST
Set size1197137580
Intron length (nucleotides)3371481<10-15<10-15
Splice site scores
Acceptor site of the of 5'-exon18,6019,09<10-15<10-11
Donor site18,1718,80<10-15<10-15
Acceptor18,0319,06<10-15<10-15
Donor site of 3'-exon18,7418,79n/sn/s
Cis-acting elements (candidate sites per nucleotie)
ESEfinder: SC350,0460,034<10-15<10-15
ESEfinder: SF2/ASF0,0400,028<10-15<10-15
ESEfinder: SRp400,0410,038<10-15
ESEfinder: SRp550,0220,022<10-15n/s
RESCUE-ESE0,0500,068<10-15<10-15
PESE0,0430,035<10-15<10-15
PESS0,0130,048<10-15<10-15
Relative position
by ordinal number0,60,5<10-15*<10-15
by gene0,6710,597<10-9<10-15
by mRNA0,4460,354<10-15<10-15
by mRNA w/o last exon0,6880,575<10-15<10-15

* Chi-square test

Definition of scored intron retention events. Gray rectangles represent exons of the RefSeq gene and mRNA. Exon/intron boundaries are marked by dotted lines. Properties of retained and constitutive introns. For all intron parameters the medians are reported. The last two columns report the statistical significance of the differences of the distributions by the Kolmogorov-Smirnov test (KS) and Student's t-test (ST); n/s – non significant. * Chi-square test The distributions of the intron lengths of retained and constitutive introns were significantly different (Fig. 2, Two-sample Kolmogorov-Smirnov test P < 10-15). The retained introns tend to be shorter than constitutively spliced out ones: 84% of the retained introns were shorter than 1000 nucleotides, compared to only 40% of the constitutive introns. The median size of the retained introns was 337, whereas the median size of the constitutive introns was 1481 nucleotides. No significant differences between distributions of flanking exons lengths were observed (data not shown).
Figure 2

Histograms of intron lengths. Red: retained introns; blue: constitutive introns.

Histograms of intron lengths. Red: retained introns; blue: constitutive introns. Scores of the intron splice sites and splice sites of the flanking exons for retained and constitutively spliced introns were calculated using a positional weight matrix as described in Data and Methods. Splice sites of retained introns were weaker: the distributions of the splice sites scores for the retained and constitutive introns were significantly different for both acceptor and donor sites (Two-sample Kolmogorov-Smirnov test P < 10-15). The median scores for the donor sites of the retained and constitutive introns were 18.2 and 18.8 respectively, whereas for the acceptor sites they were 18.03 and 19.06 respectively. The donor site scores of the 3'-flanking (downstream) exons were similar for the retained and constitutive introns, whereas the acceptor sites of the 5'-flanking (upstream) exons were considerable weaker for the retained introns compared to the constitutive ones, with medians 18.6 and 19.1, respectively (Two-sample Kolmogorov-Smirnov test P < 10-10). Densities of cis-acting elements of both types of introns were calculated using three available programs, ESEfinder [34], RESCUE-ESE [35], and PESX [36,37], as described in Data and Methods. The results are described in Table 1. The densities of most types of predicted exonic splicing enhancers (ESEs) were higher in the retained introns, whereas the density of exonic splicing silencers (ESSs) was higher in the constitutive introns (Fig. 3, 4).
Figure 3

Histograms of ESE densities predicted by ESEfinder. Red: retained introns; blue: constitutive introns.

Figure 4

Histograms of ESE densities predicted by RESCUE-ESE and PESX/PESE and ESS densities predicted by PESX/PESS. Red: retained introns; blue: constitutive introns.

Histograms of ESE densities predicted by ESEfinder. Red: retained introns; blue: constitutive introns. Histograms of ESE densities predicted by RESCUE-ESE and PESX/PESE and ESS densities predicted by PESX/PESS. Red: retained introns; blue: constitutive introns. At that, the average densities of all four ESEfinder motifs were higher in the retained introns (Fig. 3). The maximal difference between the median densities were observed for the SF2/ASF sites (median densities 0.040 and 0.028 for the retained and constitutive introns, respectively), whereas the lowest difference was observed for the SRp55 sites (median densities 0.0217 and 0.0215, non-significant). The density of PESE octamers (enhancers) was also higher in the retained introns (Fig. 4), whereas the density of PESS octamers (silencers) was higher in the constitutive introns (Fig. 4). In contrast, the density of ESE hexamers predicted by RESCUE-ESE was significantly higher in the constitutively splice introns than in the retained ones (Fig. 4). All these differences were statistically significant (Two-sample Kolmogorov-Smirnov test P < 10-15). The relative position of an intron in a gene was defined as the ratio RP = D/L, where D was the distance from the gene 5'-end to the intron 5'-end (the donor site), and L was the gene length (the distance between 5'- and 3'-ends, as listed in RefSeq). Since terminal exons and introns may have considerably different lengths ([38], and data not shown), the distances were calculated in several different settings. Firstly, we used unspliced genes, as annotated in RefSeq, and in this cases the distances were calculated using the genomic sequence. Secondly, we considered spliced genes: all introns were removed and the studied intron was reduced to a single point, "intron shadow", and the distances were calculated using the mRNA sequence. Thirdly, we considered spliced genes with the last exon removed as well. Finally, we defined relative position of an intron as its ordinal number divided by the total number of introns in a gene. The constitutive introns (blue bars in Fig. 5) are shifted towards the 3'-end in the unspliced gene calculations (Fig. 5b), and towards 5'-ends in spliced gene calculations (Fig. 5c). This is consistent with decreasing intron density and increasing exon length in the 5'-to-3' direction [38]. Indeed, when the last 3'terminal intron is removed, the distribution becomes almost uniform (Fig. 5d).
Figure 5

Histograms of the relative intron positions. A: the relative (ordinal) intron number; B: unspliced genes; C: spliced genes; D: spliced genes with the last exon removed (see the text for the detailed explanation). Left axis: the fraction of introns in each position bin is given for retained (red) and constitutive (blue) introns separately. Points 0 and 1 on the horizontal axis correspond to the 5'- and 3'-ends of the gene, respectively. Right vertical axis and the orange triangle curve: the fraction of retained introns among all introns in the bin.

Histograms of the relative intron positions. A: the relative (ordinal) intron number; B: unspliced genes; C: spliced genes; D: spliced genes with the last exon removed (see the text for the detailed explanation). Left axis: the fraction of introns in each position bin is given for retained (red) and constitutive (blue) introns separately. Points 0 and 1 on the horizontal axis correspond to the 5'- and 3'-ends of the gene, respectively. Right vertical axis and the orange triangle curve: the fraction of retained introns among all introns in the bin. The situation with retained introns is dramatically different (Two-sample Kolmogorov-Smirnov test P < 10-15 for relative intron positions in case with spliced genes and spliced genes with the last exon removed, and P < 10-9 for unspliced genes; the χ2-test P < 10-15 for the ordinal intron number). The distribution of the retained introns (red bars in Fig. 5) is considerably shifted towards the 3' in all settings, as compared to the constitutive introns. Accordingly, the fraction of retained introns increases in the 5'-to-3' direction, leveling off at about middle of the gene (the orange curve in Fig. 5).

Comparison of skipped and cryptic-site exons

The sets of splice-site inactivating mutations were collected as described in Data and Methods. Only mutations directly in the donor and acceptor sites were considered. The exons affected by the mutations were divided into skipped exons (S-exons) and exons utilizing cryptic sites (C-exons). The donor and acceptor site mutations were considered both separately and jointly, to increase the statistical power of the observations. The results are summarized in Table 2.
Table 2

Properties of skipped exons (S-exons) and exons with cryptic sites (C-exons). For all exon parameters the medians are reported. The last column reports parameters of all internal exons in our dataset of RefSeq genes. MW: the statistical significance of the differences between the S- and C-exons by the Mann-Witney test; n/s – non significant.

S-exonsC-exonsMWInternal exons
Set size
Mutated donor sites6742
Mutated acceptor sites4272
All109114154846
Exon length (nucleotides)
Mutated donor sites1141470,024
Mutated acceptor sites112,5130n/s
All1141360,020123
Densities of cis-acting elements(candidate sites per nucleotide)
ESEfinder: SC35
Mutated donor sites0,0430,042n/s
Mutated acceptor sites0,0380,045n/s
All0,0420,043n/s0,038
ESEfinder: SF2/ASF
Mutated donor sites0,0250,0370,048
Mutated acceptor sites0,0360,041n/s
All0,0280,0400,0050,036
ESEfinder: SRp40
Mutated donor sites0,0340,0430,006
Mutated acceptor sites0,0400,043n/s
All0,0350,0430,0040,040
ESEfinder: SRp55
Mutated donor sites0,0280,024n/s
Mutated acceptor sites0,0220,023n/s
All0,0250,023n/s0,023
RESCUE-ESE
Mutated donor sites0,0900,108n/s
Mutated acceptor sites0,1000,080n/s
All0,0910,094n/s0,099
PESE
Mutated donor sites0,0480,0820,007
Mutated acceptor sites0,0570,055n/s
All0,0550,0640,0230,064
PESS
Mutated donor sites0,0120,008n/s
Mutated acceptor sites0,0090,007n/s
All0,0110,007n/s0,007
Splice site scores
Mutated donor sites
Authentic donor sites18,5218,49n/s18,82
Acceptor sites of the (upstream) exon18,7019,67n/s19,08
Acceptor sites of the (downstream) intron19,3718,98n/s19,09
Mutated acceptor sites
Authentic acceptor sites19,5918,720,0519,08
Donor sites of the (downstream) exon18,4418,56n/s18,82
Donor sites of the (upstream) intron18,4818,51n/s18,79
Distance to the closest candidate site(nucleotides)
Mutated donor sites220,5750,067289
Mutated acceptor sites185660,02481
Properties of skipped exons (S-exons) and exons with cryptic sites (C-exons). For all exon parameters the medians are reported. The last column reports parameters of all internal exons in our dataset of RefSeq genes. MW: the statistical significance of the differences between the S- and C-exons by the Mann-Witney test; n/s – non significant. The S-exons were found to be significantly shorter than the C-exons (median sizes 114 and 136). No significant differences were observed in the lengths of flanking introns (data not shown). Scores of authentic splice sites and all splice sites in the adjacent exons and introns for the S- and C-exons were calculated as described in Data and Methods. Unexpectedly, the authentic acceptor sites affected by mutations were significantly weaker in the C-exons than in the S-exons, with the median scores 18.72 and 19.59, respectively (the Mann-Witney test P = 0.05). No significant differences were observed in the distribution of authentic site scores in the S- and C-exons with mutated donor sites, neither in the distribution of scores of all other considered sites. The relative enrichment by potential cryptic sites near the mutated sites was estimated by calculating the distance to the closest equivalent splice site; the latter were defined as candidate splice sites of the same type as the authentic site and having the same or higher splice site score. The search for equivalent splice sites was limited to the adjacent intron and exon, and the cases when such sites were absent were not taken into account in calculations. Both for the donor and acceptor site mutations, the S- and C-exons differed dramatically: the equivalent sites were located much closer to the authentic splice sites of the C-exons than for the S-exons. The densities of ESEfinder SF2/ASF and SRp40 motifs, as well as PESE octamers were significantly higher in the C-exons than in the S-exons with mutated donor sites, although the tendency was the same for most other types of ESEs and also in exons with mutated acceptor sites. The densities of PESS in exons with mutated splice sites of both types were higher in the S-exons, but the difference was not significant even for combined sets (The Kolmogorov-Smirnov test P = 0.09).

Discussion

The overall results of this study seem to agree with the existing biological models. The fact that retained introns are relatively short is consistent with the possibility that such introns are spliced out by the intron definition mechanism, as in this case splicing aberrations should lead to intron retention. When this study was completed, similar observations were made also in [9]. The relative weakness of splicing sites in retained introns and the fact that exons skipped due to mutations of splice sites do not have strong cryptic sites in the immediate vicinity shows that the site scores are a reasonable approximation to site strength and may determine their functionality [3,31-33,39,40] At that, unlike [3], the relative dearth of cryptic candidate sites in the vicinity of the C-exons was not confined to exclusively to the exons with mutated donor splice sites. On the other hand, we could not confirm the observation that strong acceptor sites are a characteristic of the C-exons with mutated donor sites [31]. In contrast to previous studies that were primarily interested in functional (e.g. conserved) alternative splicing of retained introns [8,10], we did not enforce possible functionality. One of consequences of that is that the majority of retained introns studied here are unlikely to encode functional proteins, as only 3.3% of them are frame-preserving (this number is close to 4.6% in-frame retained introns observed in Arabidopsis [10]). This does not preclude the possible role of such introns in regulation, either on the protein level (e.g. leading to the synthesis of shortened proteins with regulatory function) or on the mRNA level (leading to NMD-inducing isoforms in some specific conditions); some examples of such regulatory mechanisms have been mentioned in the Introduction. However, both the procedure and the obtained results seem to indicate that the majority of retained introns in our study come from underspliced transcripts. In line with this reasoning, the weakness of sites in retained introns may have two explanations. The retained introns might come from underspliced transcripts (weaker sites imply lower splicing efficiency) or be instances of regulated alternative splicing. Indeed, functional alternative splice sites are weaker than constitutive splice sites [41,42]. Further, longer introns in general tend to have stronger splice sites; however, the latter trend becomes observable only for bona fide introns longer than 1500 nt [43], and thus should not influence the majority of retained introns studies here. It has been demonstrated that both human and plant retained introns are more prevalent in the 5'- and especially 3'-untranslated regions, compared to the protein-coding regions of the mRNAs mechanism [8,10]. This has been ascribed to elimination of abnormally spliced mRNAs by the NMD mechanism [44]. However, this would not explain the observed prevalence of NMD-inducing retained introns in the 5'-regions. Our results demonstrate monotonic increase in the fraction of mostly retained introns in the 5'-to-3' direction. This is consistent with some degree of co-transciptional splicing (as opposed to simple commitment to splicing with the actual process starting simultaneously for all intron) observed in experiment [45]. However, this correlation is not straightforward. Indeed, since we considered only introns bounded on both sides by internal exons, and required that the boundaries of the exon containing the unspliced intron coincided exactly with the boundaries of the corresponding exon-intron-exon chain in the RefSeq mRNA isoform (see Methods), all retained introns considered here are followed by spliced out introns. This means that the observed tendency may not be a simple consequence of completely unspliced 3'-termini. The observed differences in the density of exonic splicing enhancers in the retained and constitutive introns as well as in the C-exons and S-exons also seem to have a natural biological interpretation. Indeed, a high density of ESE-like sites in an (relatively short) intron may lead to misrecognition of this intron as a part of an exon together with the flanking exons. Similarly, a high density of ESEs in an exon with a mutated site may force the splicing machinery to retain this exon and use a cryptic site, whereas ESSs might provoke skipping the exon. A puzzling observation that candidate enhancers predicted by RESCUE-ESE were more abundant in the constitutively splice introns than in retained ones may be explained by the fact that this method, unlike PESX, is based on the comparison of oligonucleotide frequencies in constitutive and alternative exons and does not control for the distribution of these oligonucleotides in introns [35-37]. A similar observation was recently made in [33]. Another coincidence between our study and [33] is that not all SELEX-based ESEFinder candidate exonic splicing enhancers have different densities in the S-exons and C-exons: in [33], the most pronounced effect was observed for SF2/ASF, whereas in our study a more statistically significant difference was seen for SRp40. In retained introns, the most prevalent candidate splicing enhancers were those for SF2/ASF and SC35, trailed by those for SRp40 and, marginally significant, for SRp55. Unfortunately, at present it seems impossible to repeat these analyses with intronic splicing enhancers and silencers, since no programs for their recognition are available. A more convoluted, but still plausible explanation may be found for the observed significant difference in the strength of authentic acceptor sites of the C-exons and S-exons: an exon with a weak splice site already contains more splicing enhancers than an exon with strong sites [35,46,47], and thus it is more likely to become a C-exon if the site is disrupted by a mutation.

Conclusion

Thus the analysis of retained introns in human cDNA, exons skipped due to mutations in splice sites and exons with cryptic sites produced results consistent with the intron definition mechanism of splicing of short introns and the model of co-transcriptional splicing. Retained introns tend to be short and contain a higher density of splicing enhancers. Skipped exons contain more candidate splicing enhancers and less silencers, compared to exons with activated cryptic sites. Skipped exons also do not have strong candidate splice sites in the vicinity of mutated ones.

Methods

Set of RefSeq scaffolds

Human genome (version 18, March 2006) and alignments of RefSeq genes (21.02.07) and high-throughput cDNAs (16.06.07) were downloaded from the UCSC genome browser [48]; the EST data were not used. Initially the dataset contained 25388 RefSeq mRNAs. Isoforms of alternatively spliced genes were clustered by the RefSeq gene name. To avoid redundancy in the structures of alternatively spliced genes, only the longest isoform for each such gene was retained and used as the scaffold in all further calculations. Isoform lengths were calculated for spliced mRNAs. The final set of RefSeq genes consisted of 18458 genes containing 154846 internal exons and 138777 introns between such exons. All measurements and comparisons of internal exons and introns were made according to the accepted scaffold gene structures and, in the case of mutated exons, for authentic sequences.

Sets of mutated exons

Sets of mutated exons included only internal exons affected by single-nucleotide substitutions in splice sites (from -3 to +6 for donor sites and from -15 to +2 for acceptor sites) leading to the exon-skipping (S-exons) or cryptic site activation (C-exons). The set of C-exon was also restricted to cryptic sites located in exons and introns adjacent to the mutated site. The set of C-exons with mutations in donor splice sites was obtained from [40], and contained 42 exons. The set of C-exons with mutations in acceptor sites was obtained from the DBASS3 database [39] and contained 72 exons. The set of S-exons was collected by search of published examples of exon skipping in OMIM [49] and PubMed. The collected S exons were identified in the set of RefSeq scaffolds. The final set contained, respectively, 67 and 42 S-exons with mutations in donor and acceptor sites. The sets of donor and acceptor S-exons are available as Additional files 1 and 2 respectively.

Sets of retained and constitutive (constitutively spliced out) introns

An intron retention event was scored if the high-throughput cDNA sequencing data contained an exon that exactly covered an exon-intron-exon chain in a RefSeq gene (Fig. 1). Such intron was called a retained intron. All other introns were considered to be constitutive introns. Since parameters of flanking exons were analyzed, only introns between internal exons from the RefSeq scaffolds were considered. The final set consisted of 1197 retained and 137580 constitutive introns.

Splice site scores

Scores of the donor and acceptor splicing sites were calculated using positional weight matrices covering positions from -3 to +6 (for donor sites) and from -15 to +2 (for acceptor sites). The positional nucleotide weights were calculated as in [50]: W(b,m) = log [N(b,m)+0.5]-0.25·Σi=A,C,G,T log [N(i,m)+0.5] where N(b,m) is the count of nucleotide b in position m in the training sample. The training sample was obtained from the EDAS database [6], and contained 4179 constitutive internal exons confirmed by at least 50 EST. The score of a donor site (b-3,...,b6), where bj are nucleotides, was then calculated as a sum of positional weights: S(b-3,...,b6) = W(b-3,-3)+...+W(b6,6), and similarly for scores of acceptor sites.

Densities of cis-acting elements

Putative cis-regulatory elements were identified in all internal exons and introns by several published methods. In particular, we searched for ESE motifs initially identified by SELEX (SF2/ASF, SC35, SRp40, SRp55) using ESEfinder [34]; 238 ESE hexamers predicted by RESCUE-ESE [35]; and 2060 ESE and 1018 ESS octamers predicted by PESX [36,37]. The densities of predicted regulatory elements were defined as the number of candidate of ESE sand ESS per base pair.

Statistical analysis

The statistical significance of differences between distributions of all intron parameters was measured by the Two sample Kolmogorov-Smirnov test and Student's t-test. The only exception was the distributions of the intron ordinal number, where we used the χ2 test instead of the Kolmogorov-Smirnov test. The significance of differences between mutated exon parameters, due to small data set size was measured by the Mann-Whitney test. All these tests were implemented in the R-Package [51].

Authors' contributions

MSG conceived the project. EZK collected and analyzed the data. MSG and EZK wrote the manuscript.

Additional file 1

List of skipped exons (S-exons) with mutations in donor sites. List of skipped exons (S-exons) with mutated donor sites: gene name, ordinal number of the skipped exon in the gene, exon sequence. Click here for file

Additional file 2

List of skipped exons (S-exons) with mutations in acceptor sites. List of skipped exons (S-exons) with mutated acceptor sites: gene name, ordinal number of the skipped exon in the gene, exon sequence. Click here for file
  49 in total

1.  Alternative splicing of intron 3 of the serine/arginine-rich protein 9G8 gene. Identification of flanking exonic splicing enhancers and involvement of 9G8 as a trans-acting factor.

Authors:  F Lejeune; Y Cavaloc; J Stevenin
Journal:  J Biol Chem       Date:  2000-11-28       Impact factor: 5.157

2.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

Authors:  Y Okazaki; M Furuno; T Kasukawa; J Adachi; H Bono; S Kondo; I Nikaido; N Osato; R Saito; H Suzuki; I Yamanaka; H Kiyosawa; K Yagi; Y Tomaru; Y Hasegawa; A Nogami; C Schönbach; T Gojobori; R Baldarelli; D P Hill; C Bult; D A Hume; J Quackenbush; L M Schriml; A Kanapin; H Matsuda; S Batalov; K W Beisel; J A Blake; D Bradt; V Brusic; C Chothia; L E Corbani; S Cousins; E Dalla; T A Dragani; C F Fletcher; A Forrest; K S Frazer; T Gaasterland; M Gariboldi; C Gissi; A Godzik; J Gough; S Grimmond; S Gustincich; N Hirokawa; I J Jackson; E D Jarvis; A Kanai; H Kawaji; Y Kawasawa; R M Kedzierski; B L King; A Konagaya; I V Kurochkin; Y Lee; B Lenhard; P A Lyons; D R Maglott; L Maltais; L Marchionni; L McKenzie; H Miki; T Nagashima; K Numata; T Okido; W J Pavan; G Pertea; G Pesole; N Petrovsky; R Pillai; J U Pontius; D Qi; S Ramachandran; T Ravasi; J C Reed; D J Reed; J Reid; B Z Ring; M Ringwald; A Sandelin; C Schneider; C A M Semple; M Setou; K Shimada; R Sultana; Y Takenaka; M S Taylor; R D Teasdale; M Tomita; R Verardo; L Wagner; C Wahlestedt; Y Wang; Y Watanabe; C Wells; L G Wilming; A Wynshaw-Boris; M Yanagisawa; I Yang; L Yang; Z Yuan; M Zavolan; Y Zhu; A Zimmer; P Carninci; N Hayatsu; T Hirozane-Kishikawa; H Konno; M Nakamura; N Sakazume; K Sato; T Shiraki; K Waki; J Kawai; K Aizawa; T Arakawa; S Fukuda; A Hara; W Hashizume; K Imotani; Y Ishii; M Itoh; I Kagawa; A Miyazaki; K Sakai; D Sasaki; K Shibata; A Shinagawa; A Yasunishi; M Yoshino; R Waterston; E S Lander; J Rogers; E Birney; Y Hayashizaki
Journal:  Nature       Date:  2002-12-05       Impact factor: 49.962

3.  Predictive identification of exonic splicing enhancers in human genes.

Authors:  William G Fairbrother; Ru-Fang Yeh; Phillip A Sharp; Christopher B Burge
Journal:  Science       Date:  2002-07-11       Impact factor: 47.728

4.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach.

Authors:  M S Gelfand; E V Koonin; A A Mironov
Journal:  Nucleic Acids Res       Date:  2000-02-01       Impact factor: 16.971

5.  Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human.

Authors:  Francis Clark; T A Thanaraj
Journal:  Hum Mol Genet       Date:  2002-02-15       Impact factor: 6.150

6.  Nominal growth hormone pulses in otherwise normal masculine plasma profiles induce intron retention of overexpressed hepatic CYP2C11 with associated nuclear splicing deficiency.

Authors:  N A Pampori; B H Shapiro
Journal:  Endocrinology       Date:  2000-11       Impact factor: 4.736

7.  Structure and expression of the murine calcyon gene.

Authors:  Rujuan Dai; Clare Bergson
Journal:  Gene       Date:  2003-06-05       Impact factor: 3.688

8.  Intrinsic differences between authentic and cryptic 5' splice sites.

Authors:  Xavier Roca; Ravi Sachidanandam; Adrian R Krainer
Journal:  Nucleic Acids Res       Date:  2003-11-01       Impact factor: 16.971

9.  ESEfinder: A web resource to identify exonic splicing enhancers.

Authors:  Luca Cartegni; Jinhua Wang; Zhengwei Zhu; Michael Q Zhang; Adrian R Krainer
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

10.  Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans.

Authors:  Benjamin P Lewis; Richard E Green; Steven E Brenner
Journal:  Proc Natl Acad Sci U S A       Date:  2002-12-26       Impact factor: 11.205

View more
  16 in total

1.  Evolution of prokaryotic genes by shift of stop codons.

Authors:  Anna A Vakhrusheva; Marat D Kazanov; Andrey A Mironov; Georgii A Bazykin
Journal:  J Mol Evol       Date:  2010-11-17       Impact factor: 2.395

2.  Identification and characterization of a null-activity mutant containing a cryptic pre-mRNA splice site for cytosolic fructose-1,6-bisphosphatase in Flaveria linearis.

Authors:  S M H Slater; M C Micallef; J Zhang; B J Micallef
Journal:  Plant Mol Biol       Date:  2010-10-01       Impact factor: 4.076

3.  Accurate distinction of pathogenic from benign CNVs in mental retardation.

Authors:  Jayne Y Hehir-Kwa; Nienke Wieskamp; Caleb Webber; Rolph Pfundt; Han G Brunner; Christian Gilissen; Bert B A de Vries; Chris P Ponting; Joris A Veltman
Journal:  PLoS Comput Biol       Date:  2010-04-22       Impact factor: 4.475

4.  Large-scale evidence for conservation of NMD candidature across mammals.

Authors:  David A de Lima Morais; Paul M Harrison
Journal:  PLoS One       Date:  2010-07-21       Impact factor: 3.240

5.  Splice-site mutations cause Rrp6-mediated nuclear retention of the unspliced RNAs and transcriptional down-regulation of the splicing-defective genes.

Authors:  Andrea B Eberle; Viktoria Hessle; Roger Helbig; Widad Dantoft; Niclas Gimber; Neus Visa
Journal:  PLoS One       Date:  2010-07-12       Impact factor: 3.240

6.  Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

Authors:  Mar Gonzàlez-Porta; Adam Frankish; Johan Rung; Jennifer Harrow; Alvis Brazma
Journal:  Genome Biol       Date:  2013-07-01       Impact factor: 13.583

7.  Polymorphisms in DNA repair genes of XRCC1, XPA, XPC, XPD and associations with lung cancer risk in Chinese people.

Authors:  Chaorong Mei; Mei Hou; Shanxian Guo; Feng Hua; Dejie Zheng; Feng Xu; Yong Jiang; Lu Li; Youlin Qiao; Yaguang Fan; Qinghua Zhou
Journal:  Thorac Cancer       Date:  2014-04-22       Impact factor: 3.500

8.  Deciphering the plant splicing code: experimental and computational approaches for predicting alternative splicing and splicing regulatory elements.

Authors:  Anireddy S N Reddy; Mark F Rogers; Dale N Richardson; Michael Hamilton; Asa Ben-Hur
Journal:  Front Plant Sci       Date:  2012-02-07       Impact factor: 5.753

9.  Alternative splicing and genetic diversity: silencers are more frequently modified by SNVs associated with alternative exon/intron borders.

Authors:  Jorge E S de Souza; Rodrigo F Ramalho; Pedro A F Galante; Diogo Meyer; Sandro J de Souza
Journal:  Nucleic Acids Res       Date:  2011-03-11       Impact factor: 16.971

10.  Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay.

Authors:  Zhenguo Zhang; Dedong Xin; Ping Wang; Li Zhou; Landian Hu; Xiangyin Kong; Laurence D Hurst
Journal:  BMC Biol       Date:  2009-05-14       Impact factor: 7.431

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.