| Literature DB >> 22162666 |
Abstract
Accurate and efficient splicing is of crucial importance for highly-transcribed intron-containing genes (ICGs) in rapidly replicating unicellular eukaryotes such as the budding yeast Saccharomyces cerevisiae. We characterize the 5' and 3' splice sites (ss) by position weight matrix scores (PWMSs), which is the highest for the consensus sequence and the lowest for splice sites differing most from the consensus sequence and used PWMS as a proxy for splicing strength. HAC1, which is known to be spliced by a nonspliceosomal mechanism, has the most negative PWMS for both its 5' ss and 3' ss. Several genes under strong splicing regulation and requiring additional splicing factors for their splicing also have small or negative PWMS values. Splicing strength is higher for highly transcribed ICGs than for lowly transcribed ICGs and higher for transcripts that bind strongly to spliceosomes than those that bind weakly. The 3' splice site features a prominent poly-U tract before the 3'AG. Our results suggest the potential of using PWMS as a screening tool for ICGs that are either spliced by a nonspliceosome mechanism or under strong splicing regulation in yeast and other fungal species.Entities:
Year: 2011 PMID: 22162666 PMCID: PMC3226532 DOI: 10.1155/2011/212146
Source DB: PubMed Journal: Comp Funct Genomics ISSN: 1531-6912
The names and intron positions of 24 yeast protein-coding genes which have introns in their 5′-UTRs.
| Syst. name(1) | Std name(2) | Chr | Position(3) | Genome position | Strand(4) |
|---|---|---|---|---|---|
| YBL072C | RPS8A | 2 | -315..-8 | 89440..89133 | C |
| YBL092W | RPL32 | 2 | -333..-1 | 45645..45977 | W |
| YBR089C-A | NHP6B | 2 | -384..-28 | 426873..426517 | C |
| YDL061C | RPS29B | 4 | -421..-13 | 341219..340811 | C |
| YDL137W | ARF2 | 4 | -371..-40 | 216158..216489 | W |
| YDL189W | RBS1 | 4 | -138..-40 | 122078..122176 | W |
| YDR099W | BMH2 | 4 | -826..-84 | 652781..653523 | W |
| YER102W | RPS8B | 5 | -367..-8 | 362733..363092 | W |
| YER131W | RPS26B | 5 | -361..-1 | 423591..423951 | W |
| YFR032C-A | RPL29 | 6 | -334..-4 | 223771..223441 | C |
| YGL031C | RPL24A | 7 | -463..-8 | 438397..437942 | C |
| YGL187C | COX4 | 7 | -354..-13 | 150525..150184 | C |
| YGL189C | RPS26A | 7 | -378..-11 | 148966..148599 | C |
| YGR027C | RPS25A | 7 | -327..-16 | 534785..534474 | C |
| YGR148C | RPL24B | 7 | -399..-8 | 788178..787787 | C |
| YIL123W | SIM1 | 9 | -489..-3 | 127662..128148 | W |
| YJL130C | URA2 | 10 | -385..-66 | 172752..172433 | C |
| YKL150W | MCR1 | 11 | -144..-57 | 166400..166487 | W |
| YKL186C | MTR2 | 11 | -167..-14 | 93465..93312 | C |
| YLR333C | RPS25B | 12 | -436..-14 | 796335..795913 | C |
| YLR367W | RPS22B | 12 | -564..-8 | 855878..856434 | W |
| YLR388W | RPS29A | 12 | -493..-6 | 898158..898645 | W |
| YNL066W | SUN4 | 14 | -358..-13 | 501157..501502 | W |
| YPL230W | USV1 | 16 | -93..-19 | 115219..115293 | W |
(1)Systematic name.
(2)Standard name.
(3)Site numbering relative to start codon.
(4)C—Crick strand (reverse complement), W—Watson strand.
Yeast genes whose first exon (i.e., the coding part of the first exon) is shorter than five nucleotides.
| Gene | PWM* | 1st Exon len | Sequence |
|---|---|---|---|
| BET4 | 4.2977 | 3 | AUG |
| BOS1 | 3.5760 | 3 | AUG |
| DCN1 | 6.5363 | 3 | AUG |
| MND1 | 8.1685 | 3 | AUG |
| MPT5 | 8.5055 | 3 | AUG |
| PSP2 | 8.4546 | 4 | AUGG |
| QCR9 | 5.7592 | 3 | AUG |
| RPL13A | 6.9991 | 4 | AUGG |
| RPL13B | 8.6752 | 4 | AUGG |
| RPL19A | 6.9298 | 2 | AU |
| RPL19B | 11.7762 | 2 | AU |
| RPL20A | 9.7145 | 1 | A |
| RPL20B | 8.0214 | 1 | A |
| RPL2A | 12.0769 | 4 | AUGG |
| RPL2B | 9.7326 | 4 | AUGG |
| RPL30 | 8.1799 | 3 | AUG |
| RPL35A | 7.3834 | 3 | AUG |
| RPL35B | 7.8326 | 3 | AUG |
| RPL42A | 9.2392 | 4 | AUGG |
| RPL42B | 7.0558 | 4 | AUGG |
| RPL43A | 9.9976 | 2 | AU |
| RPL43B | 12.0547 | 2 | AU |
| RPS17A | 9.1269 | 3 | AUG |
| RPS17B | 10.1283 | 3 | AUG |
| RPS24A | 9.4227 | 3 | AUG |
| RPS24B | 11.3548 | 3 | AUG |
| RPS27A | 6.3612 | 3 | AUG |
| RPS27B | 10.3823 | 3 | AUG |
| RPS30A | 10.8845 | 3 | AUG |
| RPS30B | 6.2290 | 3 | AUG |
| UBC12 | 8.4505 | 3 | AUG |
| VMA10 | 8.2722 | 3 | AUG |
| YSF3 | 7.1596 | 3 | AUG |
*Position weight matrix score at 3′ ss.
Site-specific frequencies and position weight matrix (PWM) for 275 5′ ss. The consensus sequence (UA∣UAAUU) can be obtained from those large site-specific PWM entries, with the most important sites in . The χ 2 test is performed for each site against the background frequencies (A = 0.3279, C = 0.1915, G = 0.2043, and U = 0.2763). The nucleotide sites are labeled with the five exon nucleotides as −5 to −1 and the 12 intron nucleotides as 1 to 12. The PWM is nearly identical when the introns in 5′ UTR were excluded.
| Site | A | C | G | U |
|
| A | C | G | U |
|---|---|---|---|---|---|---|---|---|---|---|
| −5 | 94 | 32 | 57 | 92 | 11.798 | 0.0081088 | 0.0641 | −0.7117 | 0.0245 | 0.2792 |
| −4 | 119 | 47 | 48 | 61 | 14.117 | 0.0027505 | 0.4032 | −0.1599 | −0.2225 | −0.3115 |
| −3 | 139 | 38 | 43 | 55 | 39.672 | 0.0000001 |
| −0.4651 | −0.3805 | −0.4601 |
| −2 | 138 | 40 | 36 | 61 | 38.899 | 0.0000001 |
| −0.3915 | −0.6355 | −0.3115 |
| −1 | 91 | 45 | 88 | 51 | 27.270 | 0.0000052 | 0.0174 | −0.2223 |
| −0.5685 |
| 1 | 0 | 1 | 274 | 0 | 1060.426 | 0.0000004 | −8.1042 | −5.4675 |
| −8.1044 |
| 2 | 0 | 9 | 0 | 266 | 658.096 | 0.0000003 | −8.1042 | −2.5200 | −8.1048 |
|
| 3 | 268 | 1 | 2 | 4 | 522.754 | 0.0000003 |
| −5.4675 | −4.6732 | −4.1523 |
| 4 | 17 | 29 | 1 | 228 | 428.607 | 0.0000002 | −2.3805 | −0.8528 | −5.5454 |
|
| 5 | 2 | 0 | 272 | 1 | 1041.047 | 0.0000004 | −5.2765 | −8.1049 |
| −5.8967 |
| 6 | 10 | 8 | 2 | 255 | 583.545 | 0.0000003 | −3.1271 | −2.6862 | −4.6732 |
|
| 7 | 97 | 18 | 39 | 121 | 55.570 | 0.0000001 | 0.1092 | −1.5351 | −0.5206 |
|
| 8 | 95 | 54 | 35 | 91 | 11.363 | 0.0099180 | 0.0793 | 0.0397 | −0.6759 | 0.2635 |
| 9 | 123 | 45 | 34 | 73 | 22.172 | 0.0000601 | 0.4508 | −0.2223 | −0.7175 | −0.0534 |
| 10 | 118 | 41 | 38 | 78 | 17.334 | 0.0006034 | 0.3911 | −0.3560 | −0.5579 | 0.0418 |
| 11 | 105 | 33 | 43 | 94 | 17.367 | 0.0005940 | 0.2232 | −0.6676 | −0.3805 | 0.3101 |
| 12 | 90 | 44 | 42 | 99 | 12.109 | 0.0070180 | 0.0015 | −0.2546 | −0.4142 | 0.3847 |
Site-specific frequencies and position weight matrix (PWM) for 301 3′ ss. The consensus sequence (∣GCUUC) can be obtained from those large site-specific PWM entries, with the most important sites in . The χ 2 test is performed for each site against the expected background frequencies. The sites are labeled with first-exon site as 1. The PWM is nearly identical when the introns in 5′ UTR were excluded.
| Site | A | C | G | U |
|
| A | C | G | U |
|---|---|---|---|---|---|---|---|---|---|---|
| −12 | 70 | 58 | 37 | 136 | 51.729 | 0.0000001 | −0.4898 | 0.0122 | −0.7264 |
|
| −11 | 79 | 51 | 23 | 148 | 79.511 | 0.0000001 | −0.3161 | −0.1727 | −1.4074 |
|
| −10 | 86 | 45 | 14 | 156 | 105.131 | 0.0000001 | −0.1941 | −0.3525 | −2.1155 |
|
| −9 | 43 | 33 | 23 | 202 | 236.063 | 0.0000001 | −1.1886 | −0.7978 | −1.4074 |
|
| −8 | 56 | 43 | 31 | 171 | 130.216 | 0.0000001 | −0.8100 | −0.4178 | −0.9801 |
|
| −7 | 102 | 35 | 31 | 133 | 54.256 | 0.0000001 | 0.0512 | −0.7134 | −0.9801 |
|
| −6 | 103 | 46 | 38 | 114 | 23.130 | 0.0000380 | 0.0653 | −0.3210 | −0.6881 |
|
| −5 | 100 | 36 | 25 | 140 | 68.925 | 0.0000000 | 0.0228 | −0.6729 | −1.2882 |
|
| −4 | 145 | 27 | 41 | 88 | 45.473 | 0.0000001 |
| −1.0854 | −0.5790 | 0.0850 |
| −3 | 15 | 127 | 0 | 159 | 284.824 | 0.0000002 | −2.6877 |
| −8.2350 |
|
| −2 | 299 | 1 | 1 | 0 | 605.789 | 0.0000003 |
| −5.5977 | −5.6756 | −8.2346 |
| −1 | 0 | 0 | 301 | 0 | 1171.443 | 0.0000004 | −8.2345 | −8.2351 |
| −8.2346 |
| 1 | 109 | 39 | 74 | 79 | 9.936 | 0.0191208 | 0.1467 | −0.5580 | 0.2697 | −0.0701 |
| 2 | 84 | 66 | 55 | 96 | 6.036 | 0.1098600 | −0.2279 | 0.1981 | −0.1571 | 0.2102 |
| 3 | 103 | 58 | 50 | 90 | 2.969 | 0.3964877 | 0.0653 | 0.0122 | −0.2940 | 0.1173 |
| 4 | 96 | 45 | 56 | 104 | 8.655 | 0.0342400 | −0.0359 | −0.3525 | −0.1312 | 0.3253 |
| 5 | 100 | 69 | 39 | 93 | 11.698 | 0.0084938 | 0.0228 | 0.2620 | −0.6508 | 0.1645 |
Figure 1Sequence logos of 5′ ss (a) and 3′ ss (b), produced with the background frequencies specified as A = 0.3279, C = 0.1915, G = 0.2043, and U = 0.2763. The nucleotides whose frequencies are lower than expected are plotted upside down. The vertical bar is the information index computed as −[∑P log2(P )], where P is the frequency of nucleotide i ( = A, C, G or U) at each site.
Evaluating statistical significance of individual nucleotide sites (site, with 5 nucleotides on the exon side labelled −5 to −1 and 12 on the intron side labeled 1 to 12) of 5′ ss by two types of false discovery rate.
| Site |
| pBH(1) | pBY(2) |
|---|---|---|---|
| 1 | *0.0000000000† | 0.002941 | 0.000855 |
| 5 | *0.0000000000† | 0.005882 | 0.001710 |
| 2 | *0.0000000000† | 0.008824 | 0.002565 |
| 6 | *0.0000000000† | 0.011765 | 0.003420 |
| 3 | *0.0000000000† | 0.014706 | 0.004276 |
| 4 | *0.0000000000† | 0.017647 | 0.005131 |
| 7 | *0.0000000000† | 0.020588 | 0.005986 |
| −2 | *0.0000004842† | 0.023529 | 0.006841 |
| −3 | *0.0000013734† | 0.026471 | 0.007696 |
| −1 | *0.0000030965† | 0.029412 | 0.008551 |
| 9 | *0.0002619304† | 0.032353 | 0.009406 |
| 10 | *0.0006307900† | 0.035294 | 0.010261 |
| 12 | *0.0025004071† | 0.038235 | 0.011116 |
| 11 | *0.0033589734† | 0.041176 | 0.011971 |
| 8 | *0.0084455695† | 0.044118 | 0.012827 |
| −5 | *0.0177349476 | 0.047059 | 0.013682 |
| −4 | *0.0182291629 | 0.050000 | 0.014537 |
(1)Critical P based on [54].
(2)Critical P based on [55].
*Significant by the criterion in [54].
†Significant by the criterion in [55].
Figure 2Relationship between splicing strength measured by position weight matrix score (PWMS) at 5′ (a) and 3′ (b) splice sites (5′ ss and 3′ ss) and gene expression measured by codon adaptation index.
Figure 3Relationship between splicing strength measured by position weight matrix score (PWMS) at 5′ (a) and 3′ (b) splice sites (5′ ss and 3′ ss) and gene expression measured by mRNA abundance [28]. The mRNA abundance is log transformed. A similar pattern is observed when the mRNA abundance from Holstege et al. [27] is used.
Figure 4Relationship between splicing strength measured by position weight matrix score (PWMS) at 5′ (a) and 3′ (b) splice sites (5′ ss and 3′ ss) and gene expression measured by protein synthesis rate [30] which is log transformed. A similar pattern is observed when the protein synthesis rate is replaced by protein abundance from Ghaemmaghami et al. [29].
Testing the predictions that introns in highly expressed genes have higher PWMS and smaller variance in PWMS than in lowly expressed genes, with gene expression measured by CAI, mRNA, and protein abundance. Introns spliced by nonspliceosomal mechanisms are excluded. Mann-Whitney tests generate similar results. All tests are two tailed. The results are nearly identical when mRNA abundance from microarray [27] is used instead of that from GATC-PCR [28] or when protein abundance [29] is used instead of the protein synthesis rate [30].
| 5′ ss | 3′ ss | |||||
|---|---|---|---|---|---|---|
| CAI | lnMRNA(1) | lnPROT(2) | CAI | lnMRNA | lnPROT | |
| N(3) | 91 | 48 | 55 | 100 | 53 | 67 |
| MeanH(4) | 11.4927 | 11.3128 | 11.4879 | 8.7188 | 8.5447 | 8.6004 |
| MeanL(5) | 9.4135 | 9.7143 | 9.7581 | 5.1109 | 4.9729 | 5.3359 |
| DF(6) | 113 | 59 | 55 | 155 | 83 | 82 |
|
| 4.1635 | 2.2501 | 2.2411 | 9.7833 | 6.9719 | 5.7687 |
|
| 0.0001 | 0.0282 | 0.0291 | <0.0001 | <0.0001 | <0.0001 |
|
| ||||||
| VarH(7) | 3.2053 | 2.8657 | 2.7345 | 2.6326 | 3.4395 | 3.4220 |
| VarL(8) | 10.3934 | 21.3602 | 24.6692 | 20.0609 | 10.4712 | 9.9223 |
|
| 3.2429 | 7.4537 | 9.0214 | 7.6211 | 3.0444 | 2.9000 |
|
| <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | 0.0001 |
(1)Natural logarithm of mRNA abundance [28].
(2)Natural logarithm of protein synthesis rate [30].
(3)Number of ss in the highly expressed and lowly expressed groups (note that N 1 = N 2 = N).
(4)Mean PWMS in highly expressed group.
(5)Mean PWMS in lowly expressed group.
(6)The t-test assuming unequal variance is used. SoDF is not equal to (N 1 + N 2 − 2).
(7)Variance in the highly expressed group.
(8)Variance in the lowly expressed group.
Position weight matrix scores (PWMSs, as a proxy for splicing strength) is significantly smaller for splice sites from intron-containing genes (ICGs) whose transcripts failed to recruit U1 snRNPs (NRG for nonrecruiting group) than for those from ICGs whose transcripts bind well to U1 snRNPs (RG for recruiting group). The pattern is consistent for both 5′ ss and 3′ ss, based on two-sample t-tests assuming equal variances. Mann-Whitney tests yield the same conclusion.
| 5′ ss | 3′ ss | |||
|---|---|---|---|---|
| NRG | RG | NRG | RG | |
| PWMS mean | 8.8138 | 11.1978 | 5.3129 | 7.1762 |
| PWMS Var. | 31.5069 | 4.8646 | 13.3017 | 8.2077 |
|
| 44 | 231 | 49 | 252 |
|
| −4.6346 | −3.9257 | ||
|
| 0.0000 | 0.0001 | ||