| Literature DB >> 15691898 |
Simon Minovitsky1, Sherry L Gee, Shiruyeh Schokrpur, Inna Dubchak, John G Conboy.
Abstract
Previous studies have identified UGCAUG as an intron splicing enhancer that is frequently located adjacent to tissue-specific alternative exons in the human genome. Here, we show that UGCAUG is phylogenetically and spatially conserved in introns that flank brain-enriched alternative exons from fish to man. Analysis of sequence from the mouse, rat, dog, chicken and pufferfish genomes revealed a strongly statistically significant association of UGCAUG with the proximal intron region downstream of brain-enriched alternative exons. The number, position and sequence context of intronic UGCAUG elements were highly conserved among mammals and in chicken, but more divergent in fish. Control datasets, including constitutive exons and non-tissue-specific alternative exons, exhibited a much lower incidence of closely linked UGCAUG elements. We propose that the high sequence specificity of the UGCAUG element, and its unique association with tissue-specific alternative exons, mark it as a critical component of splicing switch mechanism(s) designed to activate a limited repertoire of splicing events in cell type-specific patterns. We further speculate that highly conserved UGCAUG-binding protein(s) related to the recently described Fox-1 splicing factor play a critical role in mediating this specificity.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15691898 PMCID: PMC548355 DOI: 10.1093/nar/gki210
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Brain-enriched alternatively spliced exons examined in this study
| Gene name | Exon size (nt) | Human coordinates, July 03 | Mouse | Rat | Dog | Chicken | Fugu | |
|---|---|---|---|---|---|---|---|---|
| 1 | ACVR2 | 24 | chr2:148879875–148879898(+) | + | + | + | + | + |
| 2 | ANK1 | 24 | chr8:41575288–41575311(−) | + | + | + | + | |
| 3 | ATP2B4 | 178 | chr1:200879784–200879961(+) | + | + | + | + | |
| 4 | BAIAP | 84 | chr3:65330156–65330239(−) | + | + | + | + | |
| 5 | BIN1 | 93 | chr2:127919008–127919100(−) | + | + | + | ||
| 6 | CACNCB1 | 63 | chr9:136130205–136130267(+) | + | + | + | + | + |
| 7 | CLTB | 54 | chr5:175804403–175804456(−) | + | + | + | + | + |
| 8 | DCAMKL1 | 74 | chr13:34160349–34160422(−) | + | + | + | + | + |
| 9 | DLG1 | 36 | chr3:198121894–198121929(−) | + | + | + | + | |
| 10 | DNM3 | 30 | chr1:169302212–169302241(+) | + | + | + | + | |
| 11 | EPB41L1 | 36 | chr20:35498680–35498715(+) | + | + | + | + | |
| 12 | EPB41L1 | 411 | chr20:35512839–35513249(+) | + | + | + | + | |
| 13 | EPB41L3 | 36 | chr18:5397700–5397735(−) | + | + | + | + | |
| 14 | EPB41L3 | 123 | chr18:5388021–5388143(−) | + | + | + | + | |
| 15 | EWSR | 18 | chr22:27994808–27994835(+) | + | + | + | + | |
| 16 | FHL1 | 200 | chrX:133997009–133997208(+) | + | + | + | ||
| 17 | GABRG2 | 24 | chr5:161558631–161560654(+) | + | + | + | + | |
| 18 | GRIN1 | 63 | chr9:135399897–135399959(+) | + | + | + | + | + |
| 19 | KSR | 42 | chr17:26073950–26073991(+) | + | + | + | + | + |
| 20 | MAG | 45 | chr19:40495024–40495068(+) | + | + | |||
| 21 | MYH10 | 30 | chr17:8680527–8680556(−) | + | + | + | + | + |
| 22 | MYH10 | 63 | chr17:8634507–8634569(−) | + | + | + | + | + |
| 23 | NF1 | 30 | chr17:29675683–29675712(+) | + | + | + | ||
| 24 | PTPB2 | 34 | chr1:96743776–96743809(+) | + | + | + | + | + |
| 25 | PTPRF | 27 | chr1:43480037–43482062(+) | + | + | + | + | + |
| 26 | SCN8a | 123 | chr12:50460700–50460822(+) | + | + | + | + | + |
| 27 | SRC | 18 | chr20:36700292–36700309(+) | + | + | + | + | + |
(+) and (−) indicate the direction of transcription for the selected genes.
Over-represented hexamers in the proximal downstream intron region D400
| Sequence | Brain frequency ×10−3 (rank) | Control frequency ×10−3 (rank) | Contrast score | |
|---|---|---|---|---|
| Human | ||||
| UGCAUG | 2.53 (1) | 0.25 (1310) | 2.29 | 0.031 |
| UUUUAA | 1.72 (2) | 0.28 (1103) | 1.44 | 0.240 |
| UUUGUU | 1.72 (2) | 0.39 (718) | 1.33 | 0.301 |
| UUGUUU | 1.63 (5) | 0.36 (822) | 1.28 | 0.343 |
| UGUUUU | 1.72 (2) | 0.45 (555) | 1.27 | 0.347 |
| UGCUUU | 1.63 (5) | 0.37 (759) | 1.26 | 0.357 |
| UUUUAU | 1.45 (7) | 0.21 (1563) | 1.24 | 0.375 |
| UCAUUU | 1.36 (9) | 0.22 (1459) | 1.14 | 0.494 |
| UUUAUU | 1.36 (9) | 0.24 (1384) | 1.12 | 0.506 |
| GCAUGC | 1.27 (11) | 0.20 (1597) | 1.07 | 0.580 |
| Mouse | ||||
| UGCAUG | 2.36 (1) | 0.54 (283) | 1.81 | 0.030 |
| CUGCAU | 1.45 (5) | 0.38 (836) | 1.07 | 0.391 |
| UUUCUG | 1.72 (2) | 0.66 (124) | 1.07 | 0.396 |
| AUUUUA | 1.36 (7) | 0.41 (706) | 0.95 | 0.575 |
| UUUAAA | 1.27 (9) | 0.42 (627) | 0.85 | 0.803 |
| CAUUUG | 1.09 (20) | 0.28 (1512) | 0.81 | 0.867 |
| UGAUUU | 1.09 (20) | 0.35 (1016) | 0.74 | 0.964 |
| ACCAGG | 1.18 (13) | 0.44 (561) | 0.74 | 0.969 |
| UGCUUU | 1.18 (13) | 0.45 (528) | 0.73 | 0.974 |
| UUUUAA | 1.18 (13) | 0.45 (482) | 0.71 | 0.983 |
| Rat | ||||
| UGCAUG | 3.03 (1) | 0.46 (374) | 2.56 | 0.028 |
| CUGCAU | 2.05 (2) | 0.30 (1283) | 1.75 | 0.119 |
| UGCUUC | 1.46 (4) | 0.43 (468) | 1.03 | 0.543 |
| UGUCUG | 1.66 (3) | 0.67 (53) | 0.99 | 0.594 |
| GCAUGC | 1.17 (15) | 0.29 (1453) | 0.89 | 0.780 |
| GCAUGU | 1.17 (15) | 0.36 (791) | 0.81 | 0.908 |
| UCGCUG | 0.88 (54) | 0.08 (3151) | 0.79 | 0.927 |
| CCCUCU | 1.37 (6) | 0.58 (112) | 0.78 | 0.941 |
| CCCACC | 1.27 (9) | 0.50 (264) | 0.76 | 0.958 |
| CUGAUU | 1.07 (25) | 0.31 (1192) | 0.76 | 0.962 |
| Dog | ||||
| UGCAUG | 2.73 (1) | 0.25 (1310) | 2.49 | 0.024 |
| UGUUUU | 1.76 (2) | 0.45 (555) | 1.31 | 0.360 |
| UUUGUU | 1.66 (3) | 0.39 (718) | 1.27 | 0.391 |
| UUUUUG | 1.66 (3) | 0.44 (588) | 1.22 | 0.447 |
| UGCUUU | 1.56 (5) | 0.37 (759) | 1.19 | 0.488 |
| AUUUUU | 1.56 (5) | 0.40 (685) | 1.16 | 0.521 |
| UCCUUU | 1.46 (8) | 0.32 (935) | 1.14 | 0.552 |
| UUUAUU | 1.37 (9) | 0.24 (1384) | 1.13 | 0.565 |
| UUUGCU | 1.37 (9) | 0.30 (1041) | 1.07 | 0.654 |
| CUGCAU | 1.37 (9) | 0.30 (1023) | 1.06 | 0.663 |
| Chicken | ||||
| UGCAUG | 2.12 (2) | 0.50 (340) | 1.62 | 0.017 |
| UGUUUU | 2.22 (1) | 1.02 (4) | 1.19 | 0.199 |
| GUUUUU | 1.69 (4) | 0.71 (52) | 0.98 | 0.553 |
| UUUGUU | 1.80 (3) | 0.91 (13) | 0.89 | 0.778 |
| UUUUUU | 1.69 (4) | 0.84 (19) | 0.85 | 0.859 |
| CAUGCU | 1.27 (14) | 0.44 (515) | 0.83 | 0.906 |
| AUGCAU | 1.16 (19) | 0.36 (890) | 0.80 | 0.941 |
| CUUUUU | 1.48 (8) | 0.69 (62) | 0.79 | 0.951 |
| UAUAUA | 0.95 (53) | 0.19 (2299) | 0.77 | 0.977 |
| UGGUUC | 0.95 (53) | 0.21 (2089) | 0.74 | 0.987 |
| Fugu | ||||
| UGCAUG | 3.00 (1) | 0.61 (227) | 2.39 | 0.010 |
| UUAAAU | 1.62 (6) | 0.22 (1635) | 1.39 | 0.789 |
| CCAUUU | 1.62 (6) | 0.28 (1290) | 1.34 | 0.807 |
| UUUUUC | 2.08 (2) | 0.77 (76) | 1.30 | 0.884 |
| GGAUGG | 1.38 (16) | 0.11 (2562) | 1.27 | 0.893 |
| AAAUGG | 1.38 (16) | 0.11 (2562) | 1.27 | 0.893 |
| UGUGUG | 1.85 (4) | 0.61 (227) | 1.24 | 0.956 |
| UUGCAU | 1.62 (6) | 0.39 (757) | 1.23 | 0.956 |
| AUAAUC | 1.38 (16) | 0.17 (2041) | 1.22 | 0.956 |
| CUUUUU | 1.85 (4) | 0.66 (163) | 1.18 | 0.964 |
*Indicates results that are statistically significant (P-value < 0.05).
Nucleotide preferences flanking GCAUG and UGCAUG
| Human | Mouse | Rat | Dog | Chicken | Fugu | |
|---|---|---|---|---|---|---|
| (A) Abundance of hexamers containing GCAUG in the D400 region | ||||||
| 28 (65%) | 28 (74%) | 31 (79%) | 26 (74%) | 20 (67%) | 13 (93%) | |
| 8 (19%) | 5 (13%) | 5 (13%) | 5 (14%) | 6 (20%) | 1 (7%) | |
| 4 | 2 | 2 | 3 | 2 | 0 | |
| 3 | 3 | 1 | 1 | 2 | 0 | |
| (B) Abundance of heptamers containing UGCAUG in the D400 region | ||||||
| 9 | 8 | 6 | 6 | 7 | 6 | |
| 11 | 11 | 13 | 11 | 6 | 2 | |
| 5 | 6 | 8 | 4 | 5 | 3 | |
| 3 | 4 | 4 | 5 | 2 | 2 | |
| UGCAUG | 7 | 10 | 8 | 7 | 5 | 5 |
| UGCAUG | 7 | 6 | 9 | 6 | 4 | 1 |
| UGCAUG | 7 | 7 | 8 | 8 | 6 | 2 |
| UGCAUG | 7 | 6 | 6 | 5 | 5 | 5 |
Underlined residues indicate nucleotides flanking the core GCAUG or UGCAUG motif.
Most over-represented hexamers in the non-tissue-specific dataset, D400 region
| Sequence | Frequency in data set ×10−3 | Frequency in control ×10−3 | Contrast |
|---|---|---|---|
| Human | |||
| AUUUUU | 1.60 | 0.40 | 1.20 |
| UAUUUU | 1.37 | 0.32 | 1.05 |
| UUAUUU | 1.19 | 0.26 | 0.93 |
| UUUUUA | 1.22 | 0.34 | 0.88 |
| UUUUUC | 1.22 | 0.37 | 0.85 |
| UUUUUU | 1.45 | 0.60 | 0.85 |
| CUUUUU | 1.24 | 0.42 | 0.83 |
| UUCUUU | 1.22 | 0.39 | 0.83 |
| UUUAUU | 1.04 | 0.24 | 0.80 |
| UCUUUU | 1.09 | 0.34 | 0.76 |
Figure 1Enrichment of candidate UGCAUG alternative splicing enhancers in the proximal downstream intron region is conserved among vertebrate species. Histograms show the frequency of UGCAUG elements at 100 nt intervals in the introns upstream and downstream of the brain-enriched exons. Top six panels represent the analysis of intron sequences for the vertebrate species indicated at the left. Note the highest abundance of UGCAUG elements is consistently within the downstream ∼400 nt (D400) region. The bottom panel shows the much lower incidence of UGCAUG elements that occurs near a group of non-tissue-specific alternative exons. Vertical axis, frequency; horizontal axis, nucleotide range relative to the alternative exon.
Figure 2Specific location of UGCAUG hexamers near individual brain-enriched exons is highly conserved evolutionarily. The figure shows all of the brain-enriched exons for which phylogenetically conserved UGCAUG elements were identified in the flanking introns. Location of hexamers is indicated by small rectangles or circles that are color coded to indicate species. Blue, human; yellow, mouse; green, rat; red, dog; brown, chicken; and black circle, Fugu. Regulated brain-enriched exons are at the center; intron sequences to the left represent upstream sequences while those to the right represent downstream sequences.
Figure 3UGCAUG hexamers are often situated within larger regions of conserved intron sequence. Local alignments of nucleotide sequences flanking several conserved UGCAUG elements are shown. Nucleotides shared among at least two species are shaded. Notably, UGCAUG hexamers are located within regions of conservation that may be several hundred nucleotides distal to the regulated exon. Numbers on the left indicate the distance of the UGCAUG element from the exon.
Most under-represented trinucleotides, D400 region
| Human | Mouse | Rat | Dog | Chicken | Pufferfish | Non-tissue-specific |
|---|---|---|---|---|---|---|
| AGG −1.62 | AGG −0.94 | GGC −1.90 | GAU −1.13 | GGC −1.84 | ||
| AGG −2.00 | GGA −1.08 | GAA −0.75 | GAG −1.09 | GGA −0.93 | ||
| GGC −1.91 | CAG −0.84 | GGA −0.73 | GCC −1.75 | AGG −1.03 | GCC −1.59 | |
| GCC −1.53 | AAA −0.47 | AGG −1.74 | GUG −0.73 | UGA −0.67 | CCC −1.19 | |
| CAG −1.47 | GAG −0.79 | AAU −0.43 | CCC −1.43 | AGC −0.70 | UGG −0.64 | UGG −1.05 |
| CCC −1.20 | ACA −0.50 | CAG −1.32 | GGC −0.65 | GCA −0.62 | CGG −1.01 |
*Indicates the position of GGG among under-represented trinucleotides in each dataset.
Figure 4Alternative splicing of brain-enriched alternative exons is conserved in mouse. Alternative splicing of brain-enriched exons was examined by RT–PCR analysis of RNA isolated from six different mouse tissues. The figure shows polyacrylamide gel analyses indicating the amplification of brain-enriched isoforms (filled arrowheads) and non-tissue-specific isoforms (open arrowheads). The identity of the alternative exons is identified above each gel. Tissues used for PCR analysis: Br, brain; H, heart; K, kidney; Li, liver; Lu, lung and Sk, skeletal muscle.