| Literature DB >> 19889216 |
Alexander Churbanov1, Igor Vorechovský, Chindo Hicks.
Abstract
BACKGROUND: Auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19889216 PMCID: PMC2777938 DOI: 10.1186/1471-2164-10-508
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Intersection between putative intronic enhancers found separately for primates and outgroup clades and for the entire Tetrapoda superclass.
| Primates 5'SS ISEs/ISSs | 62/4.96 | 59/18.84 | 577/24.73 | 105/65.14 |
| Primates 3'SS ISEs/ISSs | 25/7.68 | 278/28.46 | 58/47.67 | 1,687/130.76 |
| Vertebrates 5'SS ISEs/ISSs | 622/33.35 | 327/130.27 | 2,428/101.57 | 297/231.66 |
| Vertebrates 3'SS ISEs/ISSs | 127/93.11 | 3,166/366.87 | 297/231.66 | 6,436/480.96 |
Here is shown the ratio between the actual intersection and the expected intersection of the sets under the null hypothesis (expected intersection between the same number of randomly generated oligos). An intersection between the two sets of elements is calculated as the number of all the possible longest common substrings LCS of the two compared elements a and b, with the size | LCS| ≡ min(|a|,|b|), in ordered pairs (a, b) coming from the Cartesian product of the sets.
Intersection of predicted elements with the systematically identified elements reported in Table 1.
| 3/9.87 | 8/202.52 | 118/330.24 | 450/206.73 | 16/83.80 | ||||||
| 105/9.46 | 68/54.19 | 46/27.40 | ||||||||
| 3/2.90 | 4/5.75 | 8/13.81 | 14/8.64 | 2/3.47 | ||||||
| 4/0.84 | 0/4.61 | 38/22.64 | 19/14.17 | 4/5.70 | ||||||
| 0/30.64 | 183/173.03 | 662/614.92 | 337/384.94 | 422/156.04 | ||||||
| 156/34.31 | 25/35.42 | 83/190.50 | 213/164.59 | |||||||
| 32/8.89 | 2/29.87 | 68/42.25 | 28/26.45 | 13/10.65 | ||||||
| 0/0.21 | 1/1.10 | 19/12.98 | 6/8.12 | 5/3.27 | ||||||
Here is shown the ratio between the actual intersection and the expected intersection of the sets under the null hypothesis (randomly generated oligos). An intersection between the two sets of elements is calculated as the number of all the possible longest common substrings LCS of the two compared elements a and b, with the size | LCS| ≡ min(|a|, |b|), in ordered pairs (a, b) coming from the Cartesian product of the sets.
Splicing regulatory elements previously predicted by systematic studies.
| Fairbrother, W.G., et al. [ | 238 hexamers as candidate ESEs |
| Zhang, X.H. and L.A. Chasin [ | Putative 2,069 octamers as exonic splicing enhancers and 974 octamers as exonic splicing silencers |
| Wang, Z., et al. [ | 133 ESS-containing decanucleotides |
| Yeo, G.W., E.L. Van Nostrand, and T.Y. Liang [ | 133 5'SS ISEs and 299 3'SS ISEs pentamers |
| Goren, A., et al. [ | 285 hexamers putative exonic splicing regulatory sequences |
| Zhang, C., et al. [ | Putative 1131 hexamers Exon-Identity Elements (EIEs) and 708 Intron-Identity Elements (IIEs) |
| Stadler, M.B., et al. [ | 380 hexamers as new candidate ESEs and 132 hexamers as new candidate ESSs |
| Wang, E. T., et. al. [ | 187 5'SS ISEs/ISSs and 175 3'SS ISEs/ISSs hexamers supporting the tissue-specific splicing events |
Counting number of conserved octamers in the exonic proximity
| Elements | 4,024 | 6,800 | Elements | 4,272 | 8,363 |
| Non- | 251,842 | 518,369 | Non- | 261,387 | 578,106 |
| Fisher 2-tail test: 1.81 × 10-22 | Fisher 2-tail test: 1.59 × 10-10 | ||||
| Element | 399 | 648 | Element | 6,537 | 11,385 |
| Non- | 251,842 | 518,369 | Non- | 261,387 | 578,106 |
| Fisher 2-tail test: 0.00025 | Fisher 2-tail test: 3.46 × 10-51 | ||||
Set of all other possible elements was obtained by excluding the ISEs and ISSs supporting either 5' or 3' SSs from the set of all possible octamers. We counted cases where oligonucleotides stay entirely conserved versus changing in at least one nucleotide position between the pairs of sequences from multiple sequence alignments, where only the motifs containing no gaps were considered. In case of 5'SS elements we considered window of size 20 nt starting 16 nt downstream from 5' exonic boundary in human sequence, where in case of elements supporting 3'SS we considered 20 nt window ending 63 nt upstream of 3' exonic boundary.
Figure 1Four segments for testing PU values for the predicted elements. Next to the 5'SS and 3'SS segments were chosen to extend 50 nt context, not including ± 30 nt, inside intron from the corresponding exonic boundaries.
Average PU for the predicted octamer elements surrounded by ± 30 nt context analyzed in various segments as shown in Figure 2.
| 3,946 | 0.100/0.100 | 3,954 | 0.185/0.191 | 4,014 | 0.189/0.196 | 4,017 | 0.148/0.154 | |
| 1,361 | 0.174/0.184 | 700 | 0.241/0.247 | 3,993 | 0.182/0.173 | 1,744 | 0.157/0.144 | |
| 3,930 | 0.165/0.165 | 4,061 | 0.205/0.206 | 3,989 | 0.190/0.187 | 3,984 | 0.182/0.174 | |
| 3,954 | 0.144/0.163 | 3,987 | 0.120/0.127 | 4,039 | 0.194/0.185 | 4,050 | 0.133/0.132 | |
| 4,004 | 0.128/0.130 | 3,966 | 0.163/0.160 | 4,053 | 0.152/0.146 | 4,024 | 0.134/0.123 | |
We classified the predicted elements surrounded by ± 30 nt context according to segments of their location. The mean PU values calculated according to [2] for wild type and dinucleotide reshuffled contexts are followed by significant P-values obtained with the Wilcoxon two-sided rank-sum tests. Only the P-values rejecting the null hypotheses (P < 1%) that the distribution is similar in the two groups of PU values are shown.
Figure 2Location of genomic regions used for comparative analysis. (A) Statistical significance tests for intronic enhancing/silencing elements surrounding exon. Blue is the null-hypothesis region and red is the region of statistical significance associated with the exon proximity. The red region is specifically located outside the area associated with donor or acceptor signal consensuses [36]. (B) Statistical significance test for the ESEs/ESSs elements supporting the exonic definition. This strategy allows canceling the statistical biases associated with the protein coding potential best characterized by the hexamer statistics [41] and focusing at the essential difference between the exonic flanks, normally enriched with ESEs [42], and the middle section supposedly depleted of such elements. (C) The differential strategy allows detecting enhancing and silencing elements that have substantially different concentration in vicinity of a strong vs. weak SS as defined by the Bayesian SS sensor [36]. The score from the sensor is measured on a discrete scale from 1 to 5, where 1 stands for a weak signal and 5 stands for strong.
Figure 3Location of the counting regions used for oligonucleotide scoring relative to exonic flanks. All short exons that were not able to accommodate the regions are disregarded. (A) The region arrangement for the counting strategies shown in Figures 2 (A) and (B), where the Skip value is set to 0 nt for the first comparative measurement and 29 nt for the second. The second comparative measurement is necessary to predict active intronic elements that have maximum enhancing/silencing potential at certain optimal distance from the exonic boundary, such as polyG signals [26]. The second measurement also trades the smaller number of longer exons considered for the greater chance of detecting element density discrepancy between the middle of the exons and the flanks. (B) The region arrangement corresponding to differential test strategy shown in Figure 2 (C). (C) The tiling strategy within a region increases the variety of elements sampled in a counting round. Tree different colors used to show which oligo within a region gets sampled in a three consecutive statistical tests (red in the first test, green in the second test, blue in the third test). This strategy reduces chances for multiple sampling of the same oligo conserved at a certain position in closely related organisms.