| Literature DB >> 24359918 |
Eva Fernández Cáceres, Laurence D Hurst.
Abstract
BACKGROUND: In humans, much of the information specifying splice sites is not at the splice site. Exonic splice enhancers are one of the principle non-splice site motifs. Four high-throughput studies have provided a compendium of motifs that function as exonic splice enhancers, but only one, RESCUE-ESE, has been generally employed to examine the properties of enhancers. Here we consider these four datasets to ask whether there is any consensus on the properties and impacts of exonic splice enhancers.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24359918 PMCID: PMC4054783 DOI: 10.1186/gb-2013-14-12-r143
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Venn diagram showing overlap between datasets in hexamers identified as possible ESEs (a) using Ke-ESE400 and (b) Using Ke-ESE. The great majority of ESEs are unique to any given dataset. For example, 47.06%, 45.80%, 67.72% and 74.75% are motifs unique to for RESCUE, PESE, ESR and Ke-ESE400, respectively.
The extent of overlap between each dataset in pairwise combination
| RESCUE | PESE | 238 | 238 | 75 | 13.8 | 5.42 | 17.3 | |
| RESCUE | ESR | 238 | 285 | 55 | 16.6 | 3.32 | 10.1 | |
| RESCUE | Ke-ESE400 | 238 | 400 | 54 | 23.2 | 2.32 | 6.8 | |
| PESE | ESR | 238 | 285 | 48 | 16.6 | 2.90 | 8.9 | |
| PESE | Ke-ESE400 | 238 | 400 | 65 | 23.2 | 2.80 | 9.2 | |
| ESR | Ke-ESE400 | 285 | 400 | 33 | 27.8 | 1.19 | 1.0 | 0.12195 |
| RESCUE | Ke-ESE | 238 | 1182 | 125 | 68.7 | 1.82 | 8.2 | |
| PESE | Ke-ESE | 238 | 1182 | 137 | 68.7 | 1.99 | 9.9 | |
| ESR | Ke-ESE | 285 | 1182 | 98 | 82.2 | 1.19 | 2.1 | 0.015 |
n1 = number of motifs in dataset 1; n2 = number of motifs in dataset 2; O = number of motifs in common between dataset 1 and dataset 2; E = expected = (n1 * n2)/T; where T is the total number of possible hexamers, that is, 4,096; F = overlap factor = O/E; factor >1 indicates more overlap than expected of two independent groups. Z score is the difference between O and E normalised by the standard deviation (derived from simulation). P values in bold are those significant after Bonferonni correction assuming P <0.05/9.
Nucleotide content of the ESE datasets
| A | 0.478 | 0.34 | 0.277 | 0.221 | 0.212 | 0.345 | 0.466 | 0.398 | 0.497 |
| G | 0.252 | 0.299 | 0.246 | 0.333 | 0.318 | 0.288 | 0.317 | 0.299 | 0.321 |
| T | 0.13 | 0.134 | 0.249 | 0.138 | 0.179 | 0.162 | 0.099 | 0.13 | 0.074 |
| C | 0.14 | 0.228 | 0.229 | 0.308 | 0.29 | 0.205 | 0.117 | 0.173 | 0.108 |
| AG | 0.73 | 0.639 | 0.522 | 0.554 | 0.530 | 0.633 | 0.784 | 0.697 | 0.818 |
| GC | 0.392 | 0.527 | 0.475 | 0.641 | 0.609 | 0.493 | 0.435 | 0.472 | 0.429 |
| 0.035 |
P value is for a binomial test on purine content using absolute counts of nucleotides with a null of 50%. Data in bold is significant after Bonferonni correction (P <0.05/9).
Purine content in and out of ESEs within the 50 bp at both exon ends
| ESR | 0.5441842 | 0.5000044 | 0.5416667 | 0.5 |
| INT2 | 0.6063711 | 0.4571333 | 0.6122449 | 0.4561404 |
| INT2.400 | 0.6612339 | 0.4584617 | 0.6666667 | 0.4571429 |
| INT3 | 0.7409292 | 0.4755146 | 0.75 | 0.4754098 |
| INT3.400 | 0.7881002 | 0.4873524 | 0.8125 | 0.4871795 |
| Ke-ESE | 0.5532219 | 0.4672781 | 0.5529412 | 0.4655172 |
| Ke-ESE400 | 0.6162153 | 0.4861662 | 0.6206897 | 0.484375 |
| PESE | 0.6167329 | 0.4687956 | 0.6190476 | 0.4693878 |
| RESCUE | 0.705773 | 0.4359836 | 0.7142857 | 0.4361702 |
We performed a paired Wilcoxon test for each exon comparing purine content in ESE and non-ESE. All tests are highly significant before and after Bonferonni correction and not shown.
Distribution of ESEs between introns, exon core and exon flanks
| ESR | 0.453 | 0.533 | 0.0016 | 0.0019 | 0.541 | 0.524 | 0.0019 | 0.0018 | ||
| INT2 | 0.310 | 0.447 | 0.001 | 0.0014 | 0.460 | 0.437 | 0.0015 | 0.0014 | ||
| INT2.400 | 0.201 | 0.305 | 0.001 | 0.0016 | 0.327 | 0.298 | 0.0017 | 0.0016 | ||
| INT3 | 0.106 | 0.171 | 0.0016 | 0.002 | 0.187 | 0.165 | 0.0023 | 0.0019 | ||
| INT3.400 | 0.069 | 0.117 | 0.0013 | 0.002 | 0.131 | 0.113 | 0.0024 | 0.0021 | ||
| Ke-ESE | 0.503 | 0.701 | 0.0004 | 0.0006 | 0.697 | 0.706 | 0.0006 | 0.0006 | ||
| Ke-ESE400 | 0.161 | 0.308 | 0.0004 | 0.0008 | 0.307 | 0.309 | 0.0007 | 0.0008 | ||
| PESE | 0.256 | 0.383 | 0.0011 | 0.0016 | 0.385 | 0.378 | 0.0016 | 0.0016 | ||
| RESCUE | 0.217 | 0.286 | 0.0009 | 0.0012 | 0.326 | 0.281 | 0.0014 | 0.0012 |
Analysis of the ESE density in introns and in exon cores and flanks, where introns are defined by the terminal 100 bases (200 bp in total) and exons flanks are the 50 bp at either end and 100 bp in the centre of the exon.
aP value for Mann Whitney U test comparing ESEs between intron flanks and exon (50 bp first and last bases and 100 in the middle).
bP value for the Mann Whitney U test comparing exon flanks and exon cores. We also provide data on hexamer density per hexamer to enable visual comparison between different datasets but these were not employed for any statistics. P values in bold are significant after Bonferonni correction, with P <0.05/9.
Paired test to examine the hypothesis that the ESE usage at exon flanks is higher than at the core of the same exon
| ESR | 0.0067 | 0.016 |
| INT2 | ||
| INT2-400 | ||
| INT3 | ||
| INT3-400 | ||
| Ke-ESE | 0.46 | 0.81 |
| Ke-ESE400 | 0.013 | |
| PESE | ||
| RESCUE |
Here we compare the first and last 50 bp of an exon with 100 bp in the centre of the exon (using exons >200 bp, n = 3,494). Only those in bold are significant after Bonferonni correction (with P <0.05/9).
Figure 2The ability of each set of ESEs to predict trends in relative synonymous codon usage. We plot the difference in HPI versus the difference in slope of codon usage, as one approaches a boundary for all pairs of synonymous codons. A negative slope implies a codon is enriched near boundaries. Thus we expect those codons with a high HPI to have a more negative slope. Thus we expect a large difference in HPI to be reflected in a large negative difference in slope between two synonymous codons.
The ability of each set of ESEs to predict trends in relative synonymous codon usage
| ESR | 54 | 33 | 0.016 | -0.21 | 0.056 | 14.04 | |
| INT2 | 46 | 41 | 0.33 | -0.18 | 0.1 | 6.82 | <0.05 |
| INT2.400 | 57 | 30 | 0.0025 | -0.14 | 0.2 | 15.2 | |
| INT3 | 56 | 31 | 0.0048 | -0.24 | 0.027 | 17.90 | |
| INT3.400 | 60 | 27 | 0.00026 | -0.18 | 0.1 | 21.11 | |
| Ke-ESE | 23 | 64 | 1 | 0.0015 | 0.99 | 0.02 | ns |
| Ke-ESE400 | 23 | 64 | 1 | -0.091 | 0.4 | 1.83 | ns |
| PESE | 49 | 38 | 0.14 | 0.00033 | 1 | 3.93 | ns |
| RESCUE | 66 | 21 | 7.10E-07 | -0.31 | 0.0031 | 39.9 |
Here was ask whether: (a) each ESE dataset can predict which of two synonymous codons is preferred near a boundary and which is relatively preferred in ESEs, assayed by their HPI scores; and (b) whether the extent of the difference in tendency to be found in ESEs predicts the degree of difference in the preference as one approaches exon ends. Regarding the first aspect, the expectation is that, orientating all comparisons such that the difference in HPI >0, the difference in slope should be negative. We thus ask whether there are more negative values than positives under a directional binomial test. As regards issue (2), we expect a negative correlation: a codon strongly preferred in ESE should be relatively strongly enriched near a boundary, hence a big difference in the slope of the codon usage near the boundary. We compute an overall P value combining the P values of the two tests using Fisher’s method to generate a chi squared value, with 2 degrees of freedom. Those indicated in bold are significant after Bonferonni correction (P <0.05/9).
Figure 3Rate of evolution of ESE and non-ESE sequence as a function of the distance from an exon boundary.
Rate of evolution of ESE and (a) non-ESE sequence and (b) pseudoESE sequence at four-fold degenerate sites
| A | ||||
| ESR | 0.3 | 0.39 | -23.1 | |
| INT2.400 | 0.32 | 0.36 | -11.1 | |
| INT3 | 0.3 | 0.36 | -16.7 | |
| INT3.400 | 0.3 | 0.35 | -14.2 | |
| INT3_ESR | 0.3 | 0.35 | -14.2 | |
| INT3_ESR_400 | 0.29 | 0.35 | -17.1 | |
| Ke-ESE | 0.35 | 0.34 | 2.9 | |
| Ke-ESE400 | 0.38 | 0.34 | 11.8 | |
| PESE | 0.31 | 0.37 | -16 | |
| RESCUE | 0.32 | 0.35 | -8.5 | |
| B | | | | |
| ESR | 0.3 | 0.35 | -14.3 | |
| INT2.400 | 0.32 | 0.35 | -8.5 | |
| INT3 | 0.3 | 0.35 | -14.3 | |
| INT3.400 | 0.3 | 0.34 | -11.8 | |
| INT3_ESR | 0.3 | 0.34 | -11.8 | |
| INT3_ESR_400 | 0.29 | 0.34 | -14.7 | |
| Ke-ESE | 0.35 | 0.35 | 0 | |
| Ke-ESE400 | 0.38 | 0.36 | 5.6 | |
| PESE | 0.31 | 0.35 | -11.4 | |
| RESCUE | 0.32 | 0.35 | -8.5 | |
We consider the proportion of four fold degenerate sites that are changed or unchanged when comparing mouse-human alignments. We split sites by whether in human the sequence matches an ESE or not (Table A) or ESE versus pseudoESE (Table B). We then perform a Wilcoxon paired test considering the rate of evolution of ESE and non-ESE or pseudoESE at each position away from an exon boundary. We also present the % difference between the medians, this being (median for ESE - median for non-ESE)/median for non-ESE. All tests are significant after Bonferonni correction (for consistency the P values are shown in bold).
The proportion of sites with a SNP depending on distance from an exon boundary (a) comparing ESE and non-ESE and (b) comparing ESE and nucleotide matched pseudoESE
| A | ||||
| RESCUE | 0.03 | 0.036 | -16.7 | |
| PESE | 0.031 | 0.036 | -13.9 | |
| ESR | 0.031 | 0.038 | -18.4 | |
| Ke-ESE400 | 0.04 | 0.032 | 25 | |
| Ke-ESE | 0.036 | 0.03 | 20 | |
| INT3.400 | 0.029 | 0.035 | -17.1 | |
| INT3 | 0.029 | 0.035 | -17.1 | |
| INT2.400 | 0.031 | 0.036 | -13.9 | |
| INT2 | 0.032 | 0.036 | -11.1 | |
| B | ||||
| RESCUE | 0.03 | 0.031 | -3.2 | |
| PESE | 0.031 | 0.033 | -6.1 | |
| ESR | 0.031 | 0.034 | -8.8 | |
| Ke-ESE400 | 0.04 | 0.036 | 11.1 | |
| Ke-ESE | 0.036 | 0.035 | 2.9 | |
| INT3.400 | 0.029 | 0.034 | -14.7 | |
| INT3 | 0.029 | 0.035 | -17.1 | |
| INT2.400 | 0.031 | 0.035 | -11.4 | |
| INT2 | 0.032 | 0.035 | -8.6 | |
Statistics are from a paired test in which we compare the proportion of sites with a SNP within an ESE with sites equidistant from an exon junction but (a) not in an ESE or (b) in a pseudoESE, using a Wilcoxson paired test. Percentage difference is defined as (Median for ESE -Median for non-ESE)/Median for non-ESE. All tests are significant after Bonferonni correction (for consistency the P values are shown in bold).
Figure 4SNP density in and out of ESEs as a function of the distance from the exon boundanry. ESE data is in orange, non-ESE data in blue.
The correlation between splice site strength and ESE density at 5’ and 3’ ends of exons
| | | | ||
|---|---|---|---|---|
| ESR | -0.047 | -0.059 | ||
| INT2 | -0.098 | -0.108 | ||
| INT2.400 | -0.099 | -0.092 | ||
| INT3 | -0.093 | -0.075 | ||
| INT3.400 | -0.094 | -0.070 | ||
| INT3_ESR | -0.091 | -0.060 | ||
| INT3_ESR_400 | -0.083 | -0.054 | ||
| INT3_RESCUE | -0.078 | -0.079 | ||
| INT3_RESCUE_400 | -0.074 | -0.065 | ||
| Ke-ESE | -0.075 | -0.129 | ||
| Ke-ESE400 | -0.080 | -0.126 | ||
| PESE | -0.082 | -0.097 | ||
| RESCUE | -0.099 | -0.054 |
All P values are significant after Bonferonni correction (shown in bold for consistency).
ESE density in alternative and constitutive exons
| | | | | |||
|---|---|---|---|---|---|---|
| ESR | 0.84 | 0.15 | 0.42 | 0.92 | 0.58 | 0.08 |
| INT2 | 0.059 | 0.21 | 0.97 | 0.90 | 0.03 | 0.10 |
| INT2.400 | 0.48 | 0.34 | 0.76 | 0.83 | 0.24 | 0.17 |
| INT3 | 0.325 | 0.832 | 0.84 | 0.59 | 0.16 | 0.41 |
| INT3.400 | 0.989 | 0.8361 | 0.49 | 0.58 | 0.51 | 0.41 |
| Ke-ESE | 0.99 | 0.99 | ||||
| Ke-ESE400 | 0.027 | 0.019 | 0.99 | 0.99 | 0.014 | 0.0096 |
| PESE | 0.022 | 0.99 | 0.99 | 0.0111 | ||
| RESCUE | 0.029 | 0.0413 | 0.015 | 0.02 | 0.99 | 0.98 |
Numbers represent P values from Mann Whitney U test comparing ESE density in alternative and constitutive exons. We present the results of the two-tailed test and the two alternative one-tailed tests. P values in bold are significant after Bonferonni correction (P <0.05/18). We compare 8,406 alternative exons and 3,249 constitutive ones.
ESE density in conserved and non-conserved exons
| | | | | |||
|---|---|---|---|---|---|---|
| ESR | 0.91 | 0.78 | 0.45 | 0.39 | 0.55 | 0.61 |
| INT2 | 0.36 | 0.15 | 0.82 | 0.075 | 0.18 | 0.92 |
| INT2.400 | 0.18 | 0.41 | 0.91 | 0.21 | 0.09 | 0.79 |
| INT3 | 0.98 | 0.67 | 0.49 | 0.66 | 0.51 | 0.34 |
| INT3.400 | 0.53 | 0.61 | 0.74 | 0.69 | 0.261 | 0.31 |
| Ke-ESE | 0.09 | 0.0029 | 0.044 | 0.96 | 0.99 | |
| Ke-ESE400 | 0.86 | 0.34 | 0.43 | 0.17 | 0.57 | 0.83 |
| PESE | 0.43 | 0.45 | 0.78 | 0.22 | 0.21 | 0.78 |
| RESCUE | 0.23 | 0.047 | 0.88 | 0.98 | 0.12 | 0.024 |
Numbers represent P values from Mann Whitney U test comparing ESE density in conserved and non-conserved exons. We present the results of the two-tailed test and the two alternative one-tailed tests. P values in bold are significant after Bonferonni correction (P <0.05/18). We compare 3,413 conserved exons and 3,249 non-conserved ones. Conservation is defined by presence/absence in mouse.
The correlation between ESE density and the size of the flanking intron, for exons with flanking introns <1,501 bp (columns 1 and 2), and for all exons (columns 3 and 4)
| | | | ||
|---|---|---|---|---|
| ESR | 0.04459 | 0.026 | ||
| INT2 | 0.01595 | 9.5e-02 | 0.012 | 0.0712 |
| INT2.400 | 0.04758 | 0.048 | ||
| INT3 | 0.05225 | 0.051 | ||
| INT3.400 | 0.05107 | 0.052 | ||
| Ke-ESE | -0.07193 | -0.083 | ||
| Ke-ESE400 | -0.05551 | -0.052 | ||
| PESE | -0.00535 | 5.75e-01 | -0.028 | |
| RESCUE | 0.10278 | 0.114 |
P values in bold are significant after Bonferonni correction (P <0.05/18).