Literature DB >> 16141195

Biased exon/intron distribution of cryptic and de novo 3' splice sites.

Jana Královicová1, Mikkel B Christensen, Igor Vorechovský.   

Abstract

We compiled sequences of previously published aberrant 3' splice sites (3'ss) that were generated by mutations in human disease genes. Cryptic 3'ss, defined here as those resulting from a mutation of the 3'YAG consensus, were more frequent in exons than in introns. They clustered in approximately 20 nt region adjacent to authentic 3'ss, suggesting that their under-representation in introns is due to a depletion of AG dinucleotides in the polypyrimidine tract (PPT). In contrast, most aberrant 3'ss that were induced by mutations outside the 3'YAG consensus (designated 'de novo') were in introns. The activation of intronic de novo 3'ss was largely due to AG-creating mutations in the PPT. In contrast, exonic de novo 3'ss were more often induced by mutations improving the PPT, branchpoint sequence (BPS) or distant auxiliary signals, rather than by direct AG creation. The Shapiro-Senapathy matrix scores had a good prognostic value for cryptic, but not de novo 3'ss. Finally, AG-creating mutations in the PPT that produced aberrant 3'ss upstream of the predicted BPS in vivo shared a similar 'BPS-new AG' distance. Reduction of this distance and/or the strength of the new AG PPT in splicing reporter pre-mRNAs improved utilization of authentic 3'ss, suggesting that AG-creating mutations that are located closer to the BPS and are preceded by weaker PPT may result in less severe splicing defects.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16141195      PMCID: PMC1197134          DOI: 10.1093/nar/gki811

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The production of mature RNA in eukaryotes requires an accurate removal of intervening sequences or introns by splicing. Splicing of precursor (pre-)mRNA is facilitated by a large complex of small nuclear ribonucleoprotein particles (U1, U2, U4/U6 and U5 snRNPs) and a number of non-snRNP proteins that assemble on primary transcripts in a step-wise manner [reviewed in (1)]. Assembly of spliceosomal complexes requires the presence of conserved recognition sequences in the pre-mRNA: 5′ splice site (5′ ss; consensus MAG/GURAGU; M is A or C; R is purine), 3′ splice site (3′ss; consensus YAG/R; Y is pyrimidine), branchpoint sequence (BPS; mammalian consensus YNCURAY) and polypyrimidine tract (PPT). In addition to these signals, efficient intron removal often entails auxiliary sequences that repress or activate splicing, termed splicing silencers or enhancers, which function as binding sites for numerous factors, such as serine/arginine-rich (SR) proteins (2–5). Alterations in any of these cis-elements by mutations may severely impair pre-mRNA splicing and gene expression. Mutations that affect splicing have been shown to account for up to half of disease-causing gene alterations (6,7). Since longer proteins are more likely to be involved in genetic disorders than shorter proteins and disease genes have, on average, a longer coding sequence and a higher number of introns than genes not causing recognizable phenotypes, it was hypothesized that splicing mutations may represent the most frequent cause of hereditary disease (8). The most common outcome of mutations affecting splice sites is exon skipping, followed by cryptic splice site activation (9,10). Cryptic 5′ ss are more common than cryptic 3′ss (9), but their unequal prevalence has not been understood. Cryptic 5′ ss have a similar frequency distribution in exons and introns (11). Because recognition of 3′ss involves additional conserved elements in the intron (PPT and BPS), the distribution of aberrant 3′ss in exons and introns would be expected to reflect a more complex sequence context of the 3′ss. However, since the initial analysis of single-nucleotide substitutions in splice junctions (10) and a survey of 15 cryptic 3′ss in 1994 (9), no reliable data have been available in the literature. Here, we have compiled sequences of previously published aberrant 3′ss in human disease genes. Our analysis revealed a biased distribution of cryptic 3′ss generated by mutations in the 3′ YAG towards exons and de novo 3′ss towards introns. We propose that the former can be fully explained by a depletion of AG dinucleotides in the PPT, while the latter is due to a lack of pyrimidine stretches downstream of authentic 3′ss. In addition, we have investigated a group of disease-causing mutations that create AG dinucleotides in the PPT and activate aberrant 3′ss upstream of BPS. We found that they shared a similar distance between predicted BPS and newly introduced AGs and show that reduction of this distance and/or the strength of the new PPT enhanced the expression of natural transcripts. These results improve prediction of aberrant 3′ss localization in human disease genes and suggest that inspection of single-nucleotide substitutions near the 3′ss in their sequence context may facilitate prediction of their splicing outcome.

MATERIALS AND METHODS

Compilation of cryptic and de novo 3′ss

Published reports of cryptic and de novo 3′ss were identified by searching PubMed (), locus specific mutation databases () and home pages of genetics journals. The search was restricted to human genes with sequence-verified aberrant RNA products published before May 2005 that resulted from disease-associated mutations or variants. In the majority of cases, these alterations were not found in DNA samples from unaffected individuals and/or showed co-segregation with affected family members in the pedigrees. The aberrant 3′ss were manually validated by mapping the information in the literature to sequences in the Human Genome project databases. Sequences of authentic, mutant and aberrant 3′ss together with the Shapiro–Senapathy (S&S) scores are available in Supplementary Tables 1 (exonic 3′ss) and 2 (intronic 3′ss). The S&S consensus matrix scores were computed using an algorithm described previously (12,13). To assess the significance of score values, the non-parametric Mann–Whitney rank test was employed as described previously (11). Sequences containing AG dinucleotides between predicted BPS and authentic 3′AG (termed ‘intervening AGs’) were extracted from a collection of 46 807 constitutively spliced human introns (14). The number of nucleotides that preceded and followed intervening AGs was computed using perl (, v. 5.6.1) scripts available on request.

Splicing reporter constructs, cell culture and transfections

Oligonucleotide primers for cloning LIPC, HEXB, FBN2, TH and TSC2 reporter minigenes (pCR3.1; Invitrogen) are shown in Supplementary Table 3. Site-directed mutagenesis was carried out as described previously (15). All wild-type and mutated constructs were validated by sequencing as described previously (16). Transient transfections were performed in 12-well plates using FuGENE 6 (Roche). Human embryonic kidney 293T cells were grown under standard conditions in RPMI1640 supplemented with 10% (v/v) fetal calf serum (Gibco BRL). The plating density was ∼105 cells per well 17–24 h before transfection. Medium was changed 2 h before adding a DNA mixture prepared by combining 1.5 µl of FuGENE and 50 µl of serum-free medium, followed by addition of 0.5 µg purified plasmid DNA. For co-transfections, 0.5 µg of reporter DNA was mixed with 1 µg of plasmids expressing SR proteins obtained as described previously (15). Cells were harvested 48 h post-transfection.

Detection of mRNA products

Total RNA was extracted as described previously (15), treated with DNase I (Ambion) and reverse transcribed using oligo(dT)15 primers and Moloney murine virus reverse transcriptase (Promega) according to the manufacturer's recommendations. Three microlitres of each cDNA reaction together with negative and positive controls were amplified with vector-specific PCR primers as described previously (15). The number of PCR cycles was 29 or lower to maintain approximately linear relationship between the RNA input and signal. PCR products were separated on polyacrylamide gels and stained with ethidium bromide. Transcript levels were measured with FluorImager 595 using FluorQuant and Phoretix software (Nonlinear Dynamics Inc.). To confirm the identity of each product, visualized fragments were extracted from the gels and sequenced as described previously (16).

RESULTS

Biased distribution of cryptic and de novo 3′ss

To maintain previous categorization of aberrant splice sites (11), cryptic 3′ss are defined as those that are only used when a mutation disrupts use of the 3′ss consensus YAG. In contrast, the term de novo refers here to all aberrant 3′ss that were induced by mutations elsewhere than in the 3′YAG (Table 1). They include newly formed 3′ss AGs that are used instead of the natural one, or result from mutations that improve BPS, PPT or auxiliary splicing signals of the new 3′ss. For simplicity, a small number of aberrant 3′ss upstream of BPS that was generated by AG-creating mutations in the PPT were also included in the latter category, although these mutations do not improve the intrinsic strength of the new 3′ss, but instead interfere with the correct utilization of the natural site.
Table 1

Summary of aberrant 3′ splice sites in human genes

Location of cryptic or de novo splice sitesExonIntronTotal
MutationIn 3′YAG (cryptic)Outside 3′YAG (‘de novo’)In 3′YAG (cryptic)Outside 3′YAG (‘de novo’)
Number of genes3916143589
Number of cryptic/de novo 3′ss (% in exons and introns)59 (74)21 (26)18 (27)49 (73)147
Number of unique 3′ss (%)56 (75)19 (25)18 (27)48 (73)141
Reading frame
    018751949 (33.3%)
    +123581551 (34.7%)
    +218951547 (32.0%)
Average distance (nt) between authentic and cryptic 3′ss (SD)43.8 (46.4)b57.9 (29.9)−71.6 (120.3)−19.1 (33.7)10.7 (67.2)b
Median distance (nt) between authentic and cryptic 3′ss1254−44−122
Number of terminal exons (%)10 (17)2 (10)6 (33)3 (6)21 (14)
Average S&S score (SD)
    Authentic (A)85.3 (6.1)80.8 (10.0)87.6 (4.9)81.3 (6.6)83.6 (7.2)
    Mutated (M)68.4 (8.0)80.2 (9.7)72.3 (5.3)78.8 (8.4)74.1 (9.5)
    Cryptic/de novo (CR)74.2 (8.2)82.3 (7.6)83.8 (6.0)77.6 (7.9)77.7 (8.5)
Average differencea
    A-M (P-value)16.9 (2 × 10−17)0.9 (N.S.)15.1 (9 × 10−10)2.5 (N.S.)9.6 (10−18)
    M-CR (P-value)5.7 (3 × 10−4)2.0 (N.S.)11.4 (10−16)−1.2 (N.S.)−3.6 (6 × 10−4)
    A-CR (P-value)11.2 (10−18)−1.2 (N.S.)3.8 (0.07)3.7 (0.04)6.0 (10−19)

aMann–Whitney rank test (SPSS, SPSS Inc., USA).

bExcluding an outlier of 1165 nt (48).

SD, standard deviation; NS, not statistically significant.

Compilation of previously published human aberrant 3′ss that were determined by sequencing mutated transcripts identified 147 cryptic and de novo sites in 89 genes (80 in exons and 67 in introns; Tables 1–3). A total of 77 cryptic 3′ss were activated as a result of point mutations in the 3′ YAG (Table 1). Thirty-eight point mutations were in intron position –1, position –2 was mutated in 33 cases and position –3 in 6 cases. Cryptic 3′ss were more frequent in exons (n = 59; P < 10−4) than in introns. Three mutations produced cryptic 3′ss both in introns and in exons (17–19). Distribution of distances between authentic and cryptic 3′ss (Figure 1A) showed that 49 of the 59 (83%) exonic cryptic 3′ss were located within 21 nt just downstream of the intron–exon boundaries. The region of the same size upstream of 3′ss, which corresponds to an experimentally determined optimal distance between the branch point (BP) adenosine and 3′ss (20), is depleted of AG dinucleotides in several species, including humans (21,22). When cryptic 3′ss between authentic 3′AG and position +21 in the downstream exon were disregarded, the distribution of the remaining cryptic 3′ss was no longer biased towards exons and resembled normal distribution (Figure 1B).
Table 3

Cryptic and de novo 3′ splice sites in introns

GenePhenotypeMutationLocation of cryptic 3′ssReference
APOEApoE deficiencyIVS3-2A>GIVS3-52(73)
ARAndrogen insensitivityIVS2-11T>AIVS2-69(23)
ATMAtaxia telangiectasiaIVS32-12A>GIVS32-11(6)
ATMAtaxia telangiectasiaIVS16-10T>GIVS16-9(6)
ATP7BWilson diseaseIVS11-2A>GIVS11-39(146)
BRCA1Breast cancerIVS7-24del10IVS7-59(147)
BRCA1Familial breast cancerIVS5-12A>GIVS5-11(148)
CFTRCystic fibrosisIVS17a-26A>GIVS17a-25(149,150)
CHIT1Chitotriosidase deficiencyE10+20dupl24E10+103(151)
COL17A1Benign epidermolysis bullosaIVS31-1G>TIVS31-69, IVS31-264(17)
COL5A1Ehlers-Danlos syndrome type IIIVS13-2A>GIVS13-100(94)
CPOHereditary coproporphyriaIVS1-15C>GIVS1-14(152)
CYBBChronic granulomatous diseaseIVS4-15del36IVS4-179(153)
CYP21B21-hydroxylase deficiencyIVS2-13C>GIVS2-19, IVS2-33(154)
DBTMaple syrup urine diseaseIVS4-17TTT>AAAIVS4-17(155)
DMDMuscular dystrophyIVS59-9T>AIVS59-7(156)
DMDMuscular dystrophyIVS8-15A>GIVS8-14(48)
ELNSupravalvular aortic stenosisIVS15-3C>GIVS15-44(93)
ERCC3Xeroderma pigmentosumIVS14-6C>AIVS14-4(157)
FANCAFanconi anaemiaIVS15-1G>TIVS15-90(158)
GCH1DystoniaIVS2-2A>GIVS2-1(159)
HBBBeta-thalassaemiaIVS1-15T>GIVS1-14(160)
HBBBeta-thalassaemiaIVS1-21G>AIVS1-19(161163)
HBBBeta-thalassaemiaIVS2-A>GIVS2-271(74)
HEXBSandhoff diseaseIVS10-17A>GIVS10-37(19)
HEXBSandhoff diseaseIVS12-26G>AIVS12-24(24,164)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS8-3T>GIVS8-2(165)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS8-16G>AIVS8-14(165,166)
ITGB2Leukocyte adhesion deficiencyIVS6-14C>AIVS6-12(167)
L1CAMX-linked hydrocephalusIVS17-19A>CIVS-69(168)
L1CAMX-linked hydrocephalus, MASA syndrome, spastic paraplegiaIVS26-12G>AIVS26-10(169)
LIPCHepatic lipase deficiencyIVS1-14A>GIVS1-78, IVS1-13(52)
MECP2Rett syndromeIVS1-6C>GIVS1-5(170)
MLYCDMalonyl-CoA decarboxylase deficiencyIVS4-14A>GIVS4-13(171)
MPOMyeloperoxidase deficiencyIVS11-2A>CIVS11-109(72)
MTM1X-linked recessive myotubular myopathyIVS12-10A>GIVS12-9(172)
MYBPC3Hypertrophic cardiomyopathyIVS14-13G>AIVS14-11(173)
NF1Neurofibromatosis type IIVS15-16A>GIVS15-15(7,25)
NF1Neurofibromatosis type IIVS15-15A>GIVS15-14(25)
NF1Neurofibromatosis type IIVS10a-9T>AIVS10a-7(7,25)
NF1Neurofibromatosis type IIVS39-12T>AIVS39-10(80)
NF1Neurofibromatosis type IIVS26-2A>TIVS26-14, IVS26-17(80)
NF1Neurofibromatosis type IIVS11-3C>GIVS11-43(7)
NF1Neurofibromatosis type IIVS15-12T>GIVS15-11(25)
OCA2Type II oculocutaneous albinismIVS5-19A>GIVS5-18(174)
PAHPhenylketonuriaIVS8-7A>GIVS8-6(66)
PAHPhenylketonuriaIVS10-11G>AIVS10-9(175)
RB1RetinoblastomaIVS22-8T>AIVS22-6(176)
SAG(S-ARRESTIN)Retinitis pigmentosaIVS10-25A>GIVS1-24(177)
SALL1Townes-Brocks syndromeIVS2-19T>AIVS2-17(178)
SERPINC1Type I antithrombin deficiencyIVS4-14G>AIVS4-12(179)
SOD1Amyotrophic lateral sclerosisIVS4-10T>GIVS4-9(180)
TCF1 (HCF-1A)Maturity-onset diabetes of the youngIVS4-2A>GIVS4-202(181)
TCF1 (HCF-1A)Maturity-onset diabetes of the youngIVS7-6G>AIVS7-4(181)
THExtrapyramidal movement disorderIVS11-24T>AIVS11-36(57)
TNFRSF1APeriodic fever syndromeIVS3-14G>AIVS3-12(26)
TP53Li-Fraumeni syndromeIVS9-1G>CIVS9-44(92)
TP53Li-Fraumeni syndromeIVS3-11C>GIVS3-10(139)
TPMTThiopurine methyltransferase deficiencyIVS9-1G>AIVS9+1, IVS9-330(18)
WRNWerner's syndromeIVS29-7T>AIVS29-5(182)
ZAP70Severe combined immunodeficiencyIVS9-11G>AIVS9-9(183)
Figure 1

Distribution of the distances between authentic and aberrant 3′ss. (A) Cryptic splice sites resulting from mutations at the 3′YAG consensus. (B) Cryptic splice sites resulting from mutations of the 3′YAG, except for cryptic 3′ss in exon positions 1–21. (C) Aberrant 3′ss due to mutations outside the 3′YAG (‘de novo’). The Stata statistical package (v. 8.2, StataCorp, TX) was used to fit kernel density plots to the distances between authentic and cryptic/de novo 3′ss. Positive and negative numbers correspond to aberrant 3′ss located in the downstream exon or the upstream intron, respectively. The number of occurrences of aberrant 3′ss is shown as short vertical bars for each distance (in nt). The corresponding scale is shown on the right side. A cryptic splice site that was found at a large distance from the authentic 3′ss (48) was omitted from the plot.

Seventy 3′ss resulted from mutations outside the 3′ YAG consensus (Table 1). Most of them were in introns (n = 48; P < 0.005), with 41 of the 48 (85%) located within 25 nt upstream of the authentic 3′ss (Figure 1C). The majority of newly created AGs in introns that were used in the catalytic step were preceded by a strong PPT, although there were several exceptions (23–26). In addition to a major frequency peak at ∼19 nt upstream of the authentic 3′ss, a second small peak was observed ∼58 nt downstream of authentic 3′ss (Figure 1C and Table 1), but the number of these 3′ss was low. The second peak is likely to result from a depletion of de novo sites in the first ∼25 nt of the exon, rather than to their absolute over-representation further downstream. Such depletion could be due to interference by complexes assembled at the authentic 3′ss, selection against codons carrying AGs in the 5′ end of internal exons (27) or a lack of suitable BP adenosine within an optimal distance from de novo 3′ss. Although intronic de novo 3′ss were almost exclusively generated by AG-creating mutations, such mutations contributed much less to the formation of exonic de novo 3′ss. Apart from three examples of direct AG creation in exons (28–30), three de novo sites were produced by point mutations in position –3 (31–33), one in position –5 (34) and three in position –6 (35–37) relative to the new intron–exon junction. Three aberrant exonic 3′ss resulted from mutations of the predicted BP adenosine (38,39) or conserved uridine in position –2 relative to BP (40), known hot-spots of single-nucleotide substitutions in the human BPS (15). Several aberrant 3′ss in exons resulted from more distant mutations (39,41), highlighting the importance of PPT, BPS and distant auxiliary splicing signals for the activation of 3′ss in this category. Together, these results indicated that, in contrast to cryptic 3′ss, distribution of de novo acceptors was biased towards introns, particularly towards the PPT. Unlike intronic 3′ss, which largely resulted from AG-creating mutations, de novo 3′ss in exons were commonly generated by mutations elsewhere than in the 3′YAG of the new intron–exon boundary. Sequence alignments of aberrant 3′ss in each category (Figure 2A–D) revealed a higher purine content for cryptic 3′ss in exons (Figure 2A) as compared with all human 3′ss (42) or corresponding authentic 3′ss (Figure 2E). Adenosine in position –3 was over-represented among exonic cryptic 3′ss (Figure 2A), possibly just reflecting higher levels of purine residues in the sequences surrounding the new intron–exon junction. The number of aberrant 3′ss in the next two categories (Figure 2B and C) was low. Intronic de novo 3′ss (Figure 2D) had frequent uridine in position +1 of the new exon as well as pyrimidines in position −4 of the new intron. Finally, purine depletion observed for aberrant 3′ss in the new PPT was in similar PPT positions as in authentic 3′ss (Figure 2E) or all 3′ss in vertebrates (42). This points to similar requirements for interactions with poly(Y) binding proteins, such as the large subunit of U2 auxiliary factor (U2AF65), and is consistent with frequent manifestation of splicing phenotypes as a result of mutations or naturally occurring DNA variants in these PPT positions (15,43–47).
Figure 2

Consensus sequences of aberrant 3′ss. (A) Cryptic splice acceptors in exons resulting from mutations of the 3′ YAG; (B) 3′ss in exons due to mutations outside the 3′ YAG; (C) cryptic 3′ss in introns that resulted from mutations of the 3′ YAG; (D) aberrant 3′ss located in introns generated by mutations outside the 3′ YAG. (E) Consensus sequences of corresponding authentic 3′ss (n = 147). The relative nucleotide frequencies at each position were plotted with a pictogram utility (). The height of each letter is proportional to the frequency of the corresponding base at the given position. Arrows indicate over-representation of adenine in position −3 (A) and of uridine in position +1 (D) of the new intron or exon, respectively.

The median distance between authentic and aberrant 3′ss was 2 nt (Table 1, each distance is shown in Supplementary Tables 1 and 2), while the absolute median distance was 16 nt. The median distances from authentic sites to exonic cryptic 3′ss or intronic de novo 3′ss were similar (12 nt; Figure 1 and Table 1). The median distances between intronic cryptic 3′ss and exonic de novo sites were also comparable (Table 1). Occasionally, mutations activated a cryptic 3′ss in an exon further downstream, up to 1165 nt from the mutation (48). In this extreme case, the authentic 3′ss was preceded by a very strong PPT, separating the 3′ss and the first upstream adenosine by 64 nt (or a predicted distant BP by 68 nt), which might have contributed to cryptic 3′ss selection such a long distance from authentic 3′ss. We also found that several identical aberrant 3′ss in exons were activated by distinct mutations, such as HEXB E11+112 by E11+8C>T and IVS10-17A>G (19,49), or F8 E16+47 by a mutation at IVS15-1 and an exon 16 mutation (38,50) (Table 2), providing clear evidence that mutations in very diverse positions may result in the same splicing defect.
Table 2

Cryptic and de novo 3′ splice sites in exons

GenePhenotypeMutationLocation of cryptic 3′ssReference
ABCR (ABCA4)Stargardt diseaseE16+1G>CE16+3(31)
ACAT1Mitochondrial acetoacetyl-CoA thiolase deficiencyE5+46C>TE5+51(35)
ALG8Glycosylation deficiencyIVS1-2A>GE2+11(96)
ARSAMetachromatic leukodystrophyE8+22C>TE8+27(37)
ASSCitrullinaemiaIVS14-1G>CE15+7(97,98)
ATMAtaxia telangiectasiaIVS38-2A>CE39+61(6)
ATMAtaxia telangiectasiaIVS64-1G>CE65+13(6)
BRCA2Breast cancerIVS23-2A>GE24+7(99)
BTDBiotinidase deficiencyE1+56G>AE1+57(30)
CBFA2 (RUNX1)Familial thrombocytopeniaIVS20-1G>TE21+13(100)
CDKL5Rett syndromeIVS13-1G>AE14+1(101)
CLN3Batten diseaseIVS15-1G>TE16+5(102)
COH1Cohen syndromeIVS51-1G>TE52+16(103)
COL17A1Epidermolysis bullosaIVS31-1G>TE32+9(17)
COL17A1Epidermolysis bullosa simplexIVS21-2A>CE22+27(104)
COL1A2Ehlers-Danlos syndrome type VIIIVS5-1G>CE6+15(105)
COL1A2Osteogenesis imperfectaIVS27-2A>GE28+46(106)
COL2A1Stickler syndromeIVS17-2A>GE18+16(107)
COL5A1Ehlers-Danlos syndrome type IIVS4-2A>GE5+12, E5+15(108)
DAFCD55 deficiencyE5+18C>TE5+44(40)
DMDDystrophinopathyIVS20-2A>GE21+7(109)
DMDMuscular dystrophyIVS74-2A>GE76+60(48)
EPB42Recessive hereditary spherocytosisE11+39G>TE11+41(33)
F8Haemophilia AE11+32G>TE11+36(34)
F8Haemophilia AE16+26G>AE16+47(38)
F8Haemophilia AIVS15+1G>TE16+47(50)
FGBHypofibrinogenaemiaE4+115T>AE4+116(28)
G6PCGlycogen storage disease type 1aE5+86G>TE5+91(36)
G6PDGlucose-6-phosphate dehydrogenase deficiencyIVS10-2A>GE11+9(110)
GH-1Growth hormone deficiencyIVS3del28-45E3+98(111)
GLAFabry diseaseIVS3-1G>AE4+1(112)
GLAFabry diseaseIVS6-1G>AE7+1(113)
GPBHenshaw antigenE5+65C>GE5+65(114)
HEXBSandhoff diseaseE11+8C>TE11+112(49)
HEXBSandhoff diseaseIVS10-17A>GE11+112(19)
HLA-B*3916Deficient expression of HLA-BE3+17G>CE3+19(32)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS1-2A>GE2+5(115)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS5-1G>AE6+1(115)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS7-1G>AE8+21(115)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS9-1G>AE10+17(116)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS9-2A>GE10+17(117,118)
HPRT1Hypoxanthine–guanine phosphoribosyltransferase deficiencyIVS9-2A>TE10+17(117,118)
INSRRabson-Mendenhall's syndromeIVS4-2A>GE5+12(119)
ITGA2BGlanzmann thrombastheniaIVS3-3DEL13E4+18(120)
ITGB4Epidermolysis bullosaIVS31-19T>AE32+38(41)
KRT14Recessive epidermolysis bullosa simplexIVS2-2A>CE3+10(121)
LAMA2Muscular dystrophyIVS28-1G>CE29+69(122)
LAMC2Junctional epidermolysis bullosaIVS3-1G>AE3+2(123)
LDLRFamilial hypercholesterolaemiaIVS1-1G>CE2+10(124)
LDLRFamilial hypercholesterolaemiaIVS7-1G>CE8+17(125)
LDLRFamilial hypercholesterolaemiaIVS9-1G>AE10+7(126)
LDLRFamilial hypercholesterolaemiaIVS9-30 GTGCTGATG>CGGCTE10+54(127)
LHX4Syndromic short statureIVS4-1G>CE5+12, E5+20(128)
MANBABeta-mannosidosisIVS15-2A>GE16+172(81)
NF1Neurofibromatosis type 1IVS27b-2A>TE28+293(80)
NIS (SLC5A5)Congenital hypothyroidismE13+67C>GE13+67(29)
OAS1Oligoadenylate synthase activityIVS6-1A>GE7+1, E7+137(129)
OTCOrnithine transcarbamylase deficiencyIVS4-2A>TE5+12(130)
PDE6BAutosomal recessive retinitis pigmentosaIVS2-1G>TE3+12(131)
PFKMMuscle phosphofructokinase deficiencyIVS6-2A>CE7+5, E7+12(132)
PKLRPyruvate kinase deficiencyIVS3-2A>TE4+6(133)
PTPSPyruvoyl-tetrahydropterin synthase deficiencyIVS1-3C>GE2+12(134)
SCNN1GPseudohypoaldosteronism type 1IVS2-1G>AE3+6(135)
SPG4Spastic paraplegiaIVS6-1G>AE7+8(136)
SPINK5Netherton syndromeIVS20-1G>AE21+1(137)
TNFSF5 (HIGM1)X-linked hyper-IgM syndromeIVS4-2A>GE5+8(138)
TP53Li-Fraumeni syndromeIVS3-1G>AE4+19(139,140)
TP53Li-Fraumeni syndromeIVS5-11DEL11E6+17(141)
TP53Li-Fraumeni syndromeIVS5-1G>AE6+1(139)
TP53Lung cancerIVS3-1G>CE4+19(142)
TSC2Tuberous sclerosisIVS38-18G>AE39+74(39)
TSC2Tuberous sclerosisIVS9-15G>AE10+56(39)
TSC2Tuberous sclerosisIVS9-3C>GE10+56(39)
UGT1A1Crigler-Najjar syndrome type 1IVS3-2A>GE4+107(143)
UGT1A1Crigler-Najjar syndrome type 1IVS4-1G>AE5+7(144)
XPAXeroderma pigmentosum group AIVS3-1G>CE4+2(145)
As with the cryptic 5′ ss (11), neither cryptic nor de novo 3′ss showed any bias towards a particular reading-frame phase relative to the position of authentic 3′ss (Table 1). This may reflect only a partial elimination of mRNAs with premature termination codons by nonsense-mediated mRNA decay or even a complete unresponsiveness of mutated transcripts to RNA surveillance mechanisms, as reported for HBB (51). Finally, similar to the cryptic 5′ ss (11), pair-wise comparisons of the average S&S matrix scores for authentic 3′ ss and their mutated and cryptic counterparts showed significant differences, with the identical A>CR>M score hierarchy for both exonic and intronic cryptic 3′ ss (Table 1 and Supplementary Tables 1 and 2). In contrast, the average S&S matrix scores for de novo 3′ ss were not significantly higher than corresponding mutated sites and were not lower than corresponding authentic 3′ ss, except for a borderline significance of intronic de novo 3′ ss (Table 1). Thus, the predictive value of the S&S matrix scores was evident only for cryptic, but not for de novo 3′ss.

AG-creating mutations in the PPT that suppress authentic 3′ss and activate aberrant 3′ss: a role for ‘BPS-new AG’ distance

The majority of AG-creating mutations in the PPT that resulted in the activation of aberrant 3′ss used the newly introduced AGs in the second step of splicing (Figure 1C). However, there were several exceptions. Newly created AGs in LIPC (52), HEXB (19) and AR (23) were not efficiently used for exon ligation, but led to the activation of aberrant 3′ss upstream of putative BPS, while suppressing authentic 3′ss. A similar repression of authentic 3′ss was observed for FBN2, where the introduction of AG, which was not used in the catalytic step in vivo, resulted in exon skipping (53). In this small group of mutations, recognition of new AGs was sufficient to suppress utilization of authentic 3′ss, but was insufficient for exon joining. Sequence inspection of these cases revealed that even though the distance between newly created AGs and authentic 3′ss was variable, the distance between predicted BPS and the mutation was similar, ranging from 11 to 15 nt (Figure 3A).
Figure 3

AG-creating mutations in the PPT that activate aberrant 3′ss upstream of predicted BPS. (A) Aberrant 3′ss activated by newly created AGs (in italics) that repress (minus sign) authentic 3′ss in the PPT (upper panel). Distances between upstream 3′ss (U) and predicted BP, between newly created AG (mutated, M) and BP (arrow) and between the mutated and authentic (A) 3′ss are in base pairs, bp (lower panel). The S&S scores were computed for U, M and A 3′ss using an algorithm as described previously (12,13). BPS is shown as a black rectangle. Disease-causing mutations were LIPC IVS1-14A>G (52), HEXB IVS10-17A>G (19), AR IVS2-11T>A (23) and FBN2 IVS28-15A>G (53). Putative BPSs were GGCTAAG, GCCTAAT, TATCAAC and TGACAAT, respectively. (B) Schematic representation of minigene constructs. Exons are shown to scale (scale unit is 0.1 kb). The sizes of minigene introns (lines; not to scale) are shown below each construct. Intron truncations are indicated by a slash. Full LIPC introns 1 and 2 were 106.2 and 3.3 kb, respectively. Allele-specific DQB1 minigenes were described previously (15). (C) Splicing pattern of mutated minigenes after transfection into 293T cells. RT–PCR products amplified with vector-specific primers PL3 and PL4. Wild-type minigenes containing predicted BP adenine in the indicated positions were mutated to C, T and G. RNA species were confirmed by sequencing and are schematically shown on the right side and in (B). The first, second and third exons are shown as white, grey and black boxes, respectively. Introns are shown as lines. Thick lines indicate partial intron retention due to activation of aberrant splice sites. (D–F) Nucleotide sequence of RT–PCR products bridging aberrant 3′ss in mutated constructs. Exons (e) are indicated by grey rectangles, introns (IVS, intervening sequence) by a white rectangle. Aberrant 3′ss are designated by a distance from the corresponding authentic splice site. (G) Activation of aberrant splice sites is specific for AG dinucleotides. Mutations removing AG dinucleotides are indicated at the top and bottom of each panel. (H) AG dinucleotides within predicted BPS do not activate cryptic splice sites. Mutations creating AG dinucleotides in the predicted BPS are indicated at the top and bottom of each panel.

To investigate the distance requirements for the activation of aberrant 3′ss, we constructed three-exon splicing reporters for LIPC, HEXB and FBN2 (Figure 3B). We refer to these 3′ss as upstream (U) and mutated (M) (Figure 3A) to avoid ambiguous distinction between cryptic and de novo 3′ss in these cases. As each germline mutation was an A>G transition (19,52,53), which are characteristic of BP substitutions (15), we first attempted to determine the BP. However, none of the pre-mRNA substrates could be spliced in vitro with varying concentrations of nuclear extracts and Mg2+ (data not shown). As A>G substitutions in the BP impart a particularly strong block of splicing in vitro (54) and in vivo, which may facilitate BP determination (J. Královičová, H. Lei and I. Vořechovský, manuscript submitted), we mutated the putative BP adenosines into G, C and T in each wild-type construct (Figure 3C). Examination of RNA products after transfection showed that only the G-containing clones led to aberrant splicing. In FBN2, exon skipping observed in vivo in a patient with contractural arachnodactyly (53) was replicated in 293T cells where it was accompanied by a partial utilization of the newly introduced AG (Figure 3C and D). In LIPC, we found two aberrant 3′ss, as reported earlier (52). The first 3′ss was utilized in ∼6% at the site of mutation (IVS1-13) and the other was upstream of the predicted BPS (IVS1-78; Figure 3B, C and E). The ratios of unspliced/-13/-78 RNA products described previously (1/2.9/0.8, respectively) (52) was altered in favour of the upstream cryptic 3′ss in our system (Figure 3C). This was most likely due to a better BPS consensus in our construct, because a previously used minigene (52) had intron truncation just upstream of IVS1-78 and lacked a suitable alternative BPS (data not shown). Finally, the HEXB transition (Figure 3A) also activated two aberrant 3′ss. One was 37 nt upstream of exon 11 and the other was in exon 11, 112 nt downstream of the authentic site (Figure 3B and F). We could not exclude that IVS10-17A is the BP of the central exon as suggested earlier (19), but improved splicing of uridine- and, to a lesser extent, cytosine-containing pre-mRNAs (Figure 3C, middle panel) suggested that this mutation is in the PPT since uridines are the preferred PPT nucleotides (55,56). As LIPC and FBN2 mutations were even closer to the 3′ss than the HEXB mutation (Figure 3A), their BPs are likely to map upstream of newly introduced AGs as well. To test whether the aberrant 3′ss were activated only by AGs and not by other dinucleotides, we changed BP-1 adenosines to guanosines in each wild-type and mutated FBN2/LIPC minigene to create additional dinucleotides GA, GC, GT and GG (Figure 3G). Of the eight dinucleotides (Figure 3C and G), only AG-containing reporters maintained the splicing defects. In LIPC, utilization of the upstream 3′ss IVS1-78 was highest in pre-mRNAs containing guanosine -15 (IVS1-15G), followed by those containing A or C, and was lowest for pre-mRNAs with uridine at this position, further supporting the PPT location of the newly formed AG. To formally show that the predicted BPS itself can tolerate AG dinucleotides, we employed two splicing reporters that were derived from the TH and TSC2 genes. Both mutated minigenes represented previously observed substitutions in the predicted BPS that resulted in genetic disease (39,57). Substitutions of the predicted BP adenosines to the remaining nucleotides in both reporters dramatically increased exon skipping (J. Královičová, H. Lei and I. Vořechovský, manuscript submitted). Since predicted BP-As were preceded by Gs in each intron, we mutated BP-1Gs to As in the wild-type and mutated reporters, but the splicing pattern observed for the constructs carrying AA, AC, AT and AG dinucleotides was similar to those containing GA, GC, GT and GG (Figure 3H and data not shown). Taken together, these data suggested that activation of 3′ss upstream of the BPS and inhibition of authentic 3′ss is specific for AG dinucleotides in the PPT.

Reduction of the ‘BPS-new AG’ distance improves the expression of natural transcripts

To test a role of the ‘BPS-newAG’ distance in repression and activation of three competing 3′ss, we examined the splicing pattern of LIPC and HEXB constructs containing serial 3 nt deletions in this region (Figure 4A–C). In addition, we modified this distance in the HLA-DQB1 reporter (allele *0602), which contains a preexisting AG dinucleotide downstream of the BPS (Figure 4A and D) (15). The BP of DQB1 exon 4 was determined by reverse transcription and mutagenesis (15) and corresponded to the best match to mammalian BPS in intron 3 (Figure 4A) and to a computationally predicted BPS (14).
Figure 4

A role for the BPS-new AG distance and/or the strength of the new PPT in upstream cryptic 3′ss activation. (A) Nucleotide sequences of splicing reporter constructs at the 3′ss are followed by the S&S matrix scores and by the percentage of splice site utilization of the indicated RNA products (means of duplicate transfections). Intronic sequences are in lower case, exonic sequences are in upper case. Putative BPs are shaded. U, aberrant 3′ss upstream of predicted BPS; M, newly created (preexisting in DQB1) or proximal AG between the authentic 3′ss and BP; A, authentic or distal 3′ss. ES, exon skipping; +112, % utilization of the splice site +112 by the HEXB pre-mRNAs site. The S&S scores were calculated according to the algorithm described previously (12,13). BPSs of LIPC exon 2 and of DQB1 exon 4 were predicted by a branch site tool (), with BPS scores 3.25 and 3.2, respectively. No BPS was predicted for HEXB. The HEXB IVS10-29A>G and IVS10-29A>T gave splicing patterns identical to the wild-type constructs (data not shown). RT–PCR products for the LIPC-14Y and HEXB-17Y mutations are shown in Figure 3C. (B–D) RNA products generated by wild-type and mutated constructs after transfection into 293T cells. The designation of splicing reporter constructs (top of each panel) corresponds to that in (A). RNA products were confirmed by sequencing and are schematically shown on the right side. (E) Nucleotide sequences of RT–PCR products illustrating aberrant splice sites in DQB1 reporters.

In LIPC, deletions that reduced the distance between predicted BP and the newly created AG from 13 to 10, 7 and 4 nt, eliminated use of the mutated site (Figure 4A and B, lanes 1–5). Three- and six-nt deletions rescued normal splicing to 16 and 89%, respectively, and progressively inhibited upstream 3′ss (lanes 3 and 4). However, the largest deletion no longer improved splicing to the authentic 3′ss (lane 5), most likely by reducing the gap between the BP and authentic 3′ss to only 17 nt, which is below the experimentally determined optimum of 19–23 nt (20). These results suggested that the minimum ‘BP-new AG’ distance required for engagement of the new AG in the catalytic step and for complete suppression of authentic 3′ss in this pre-mRNA was ∼13 nt. Bringing this AG closer to the BPS was associated with a significant production of natural transcripts. Their increase was accompanied by a decrease of the S&S scores for the mutated sites (Figure 4A), suggesting that diminished recognition of the de novo site due to reduced pyrimidine content upstream of the mutated AG in deletion mutants weakens AG-induced inhibition of the authentic 3′ ss. Also, we could not exclude a beneficial effect of a reduced ‘BP-authentic AG’ distance on the expression of natural transcripts. In HEXB, splicing to the authentic 3′ss was rescued only by the largest deletion, which decreased the ‘BP-new AG’ distance from 15 to 6 nt (Figure 4C). Reduction of the S&S score for the mutated 3′ ss in this deletion mutant was associated with increased use of authentic sites and decreased use of 3′ ss E+112. In DQB1, a preexisting AG is located 7 nt downstream of the BP adenosine (Figure 4A) and is not used in the second splicing step (15). Insertion of five uridine residues in front of the 3′ss CAG consensus to reach a BP-AG distance comparable with the remaining reporters (Figure 3A) was sufficient for ∼9% utilization of this AG (Figure 4D, lane 2). The new 3′ss (Figure 4E) was markedly promoted further by extending the new PPT to (T)10 with a concomitant improvement of exon inclusion, both in the presence (alleles DQB1) and absence (DQB1) of a polymorphic guanosine in position -14 (Figure 4A and D, lanes 3 and 6). Together, these results showed that moving the newly created AG closer to the BP while maintaining the distance between authentic and new AGs promoted selection of the authentic 3′ss. They also suggested that AG-creating mutations located closer to the BPS and/or preceded by weaker PPTs may have less severe phenotypic consequences and provided support to the model that proposes a substrate-specific area of exclusion of AG dinucleotides downstream of the BP.

Prevalence and sequence context of intervening AG dinucleotides in authentic and aberrant 3′ss

The mutated AG was preceded by the pyrimidine and followed by the uridine in each case (Figure 3A). The occurrence of pyrimidines in the former position suggested that, similar to AGs in authentic 3′ss, cytosines and uridines may facilitate partial recognition of intervening AG dinucleotides. However, the presence of uridines that immediately followed each of these AGs was conspicuous, because uridine is the least frequent nucleotide in humans, mouse, zebrafish and fugu in this position (42). This could simply reflect a higher uridine content in the surrounding sequence (cf. Figures 2D and E); nevertheless, we analysed the effect of T>G alterations in this position on LIPC and DQB1 splicing. The LIPC IVS1-13T>G mutation, which improved a match to the 3′ss YAG/R, completely eliminated utilization of this site and promoted the upstream 3′ss (Figure 4B, lane 6). In contrast, an insertion of a polymorphic guanosine in position –14 of DQB1 intron 3 further promoted the new 3′ss (Figure 4D, lanes 2 and 5). AGs between BPS and authentic 3′AG are rare (21), but they are more common near the BPS (14) or closer to the 3′ss, often as ‘tandem’ (NAGNAG) acceptors (58) that effectively compete with each other. Inspection of 147 aberrant 3′ss revealed 15 (10%; 11 in exons, 4 in introns) intervening AGs in a 14 nt sequence upstream of authentic 3′ss (Supplementary Tables 1 and 2). In contrast, corresponding authentic 3′ss had only four AGs in this region (P < 0.01; Fisher exact test), suggesting that intervening AGs are more common in aberrant than in authentic 3′ss. Use of 3′AG in splicing is influenced by the identity of the preceding nucleotide, with a hierarchy of competitiveness CAG ∼ UAG > AAG > GAG (59,60). We analysed the frequency of each nucleotide that precedes intervening AGs as a function of distance from the BPs predicted through comparison of mouse and human introns (14). To maximize the probability of studying functional BPSs, we limited our analysis to 40 388 human introns, in which predicted BPs were located within a 40 nt distance from authentic 3′ss. Interestingly, cytosines were consistently over-represented in each position between 3 and ∼21 nt downstream of the predicted BP adenosine (Figure 5A). A frequency peak in position 3 apparently reflects over-representation of cytosines in the last nucleotide of predicted BPS (YNCURAY+1A+2G+3). The relative nucleotide frequencies varied little around 40% for cytosines and 20% for the remaining nucleotides along the whole distance. An exception was position 8 and 9 downstream of BP adenosine where purine depletion was even greater (Figure 5A). The non-random distribution of nucleotides that preceded intervening AGs (P < 10−15 for positions 4–21) is consistent with the presence of splicing complexes covering this region and appears to support functionality of most predicted BPSs, although their average distance from authentic 3′ss (14) is longer than an experimentally determined optimum (20,61).
Figure 5

Sequence context of intervening AG dinucleotides in authentic 3′ss. Relative frequencies of nucleotides that immediately precede (A) or follow (B) AGs located 2–30 nt downstream of predicted BP adenosines. Corresponding numbers of intervening AGs are shown as grey columns. Distances between the predicted BP adenosine and downstream AG are in nucleotides (nt) as follows: 2 nt (YNCURAA+1G+2; BP is underlined), 3 nt (YNCURAY+1A+2G+3), 4 nt (YNCURAY+1N+2A+3G+4), etc., up to 30.

Among nucleotides that followed intervening AGs, pyrimidines were consistently over-represented in positions 4–28 nt downstream of the predicted BP adenosine (Figure 5B, P < 10−9), in sharp contrast to authentic 3′AGs. The number of intervening AGs was low for distances more than 20 nt downstream from the predicted BP adenosines, but 117 of the 208 (56%) AGs were followed by cytosine residues in this region. The distribution of nucleotides that precede and follow intervening AGs was very similar in 40 626 murine introns where the distance characterized by over-representation of cytosines that preceded intervening AGs had the same length (3–21 nt downstream of predicted BP adenosine; data not shown). Over-representation of pyrimidines in position +1 was also very similar in mouse introns to that observed in human introns (Figure 5 and data not shown). In a 14 nt sequence upstream of aberrant 3′ss, 5 of the 15 AGs were preceded by cytosines and 10 of the 15 AGs were followed by pyrimidines, which did not appear to be significantly different from intervening AGs of authentic 3′ss. Finally, since closely spaced AG dinucleotides at the 3′ss (20,60–62) and in a splicing silencer (63) may interfere with each other, we employed the LIPC reporter to bring the newly created and authentic 3′AGs to proximity. We introduced an 8 nt deletion just upstream of the authentic 3′ss in the construct with the new AG to shorten this distance to just 5 nt while maintaining the BP-3′ss length within the previously determined optimum (20). The 5 nt distance was still sufficient for an interplay between closely spaced AGs (20,61,63). The 8 nt deletion repressed utilization of both competing AGs and resulted in exclusive splicing to upstream 3′ss (Figure 4B, lane 7). As with DQB1 (Figure 4A and D), a small insertion of repetitive uridines between the BP and the newly created AG rescued splicing to this AG in the presence of the 8 nt deletion (Figure 4B, lane 8). However, this AG was used slightly less than for the -14A>G mutation, despite more optimal BP-AG distance and better PPT of the new exon, supporting a strong interference by the authentic 3′AG (Figure 4A and B, lanes 2 and 8).

The influence of SR proteins on utilization of aberrant 3′ and 5′ ss in LIPC

To illustrate the effect of known regulators of splicing on cryptic 3′ss utilization, we co-expressed the wild-type and mutated (IVS1-14A>G) LIPC reporter with SR proteins (Figure 6A). The co-transfection experiments revealed activation of two more aberrant splice sites (designated e2-24 and IVS2-94, Figure 6B) of the second minigene intron. In both mutated and wild-type reporters, activation of proximal 3′ss (IVS2-94) and distal 5′ ss (e2-24) was promoted by ASF/SF2 and SRp40. Since the IVS2-94 splice site has a very strong PPT and is separated from the authentic 5′ ss by only 61 nt, activation of the distal 5′ ss in exon 2 may be due to restrictions on minimum intron size. A subset of SR proteins also promoted the use of upstream 3′ss IVS1-78 in both the wild-type and mutated minigenes and reduced the amount of correctly spliced products and those spliced to the mutated site IVS1-14 (Figure 6A). These results confirm that SR proteins may promote selection of proximal splice sites as described (15,64,65) and suggest that the observed activation of atypical distal 5′ ss can be explained by intronic length constraints.
Figure 6

The influence of SR proteins on utilization of aberrant 3′ and 5′ ss in LIPC. (A) Wild-type and mutated splicing reporters were co-transfected with plasmids expressing the indicated SR proteins. Splicing reporters are shown at the top and SR proteins are indicated at the bottom. VO, vector only control, in which an empty pCG vector was co-transfected with the wild-type and mutated reporter constructs; NC, no co-transfection (reporter only) control. The corresponding LIPC isoforms are shown on the right side. (B) Nucleotide sequences surrounding aberrant splice sites induced by SR proteins. Exons (e) are shown as a grey rectangle, introns (IVS) are indicated by white rectangles.

DISCUSSION

In this study, we have undertaken the largest compilation of aberrant 3′ss in human disease genes to date. Although cryptic 3′ss are less frequent than cryptic 5′ ss (9), the size of the compiled dataset was comparable with that analysed earlier for cryptic 5′ ss (11). The overall number of aberrant 3′ss was marginally higher in exons than in introns (Table 1), but reports of AG-creating PPT mutations, in which RNA products were not sequenced and were thus not included in our study, appear to be more frequent in the literature (66–71) than those in exons. This suggests that aberrant 3′ss described in the literature are distributed in exons and introns with approximately equal frequencies, consistent with the observed low median distance between all aberrant and authentic 3′ss (Table 1). In contrast, if a mutation affects 3′YAG, cryptic 3′ss are ∼3 times more likely to occur in exons than in introns. Conversely, aberrant 3′ss due to mutations outside 3′YAG are ∼3 times more frequent in introns than in exons. The relative intronic depletion of the former can be accounted for by a lower prior probability of finding an alternative 3′AG in the PPT (Figure 1A and B), whereas intronic over-representation of the latter is due to a better 3′ss consensus created in the pyrimidine-rich sequence. Sequence constraints that limit the number of cryptic and de novo 3′ss also explain a lower total number of aberrant 3′ss in the literature as compared with aberrant 5′ ss. However, since recognition sequences at the 3′ ss (YAG, PPT and BPS) are more complex than at the 5′ss and spread over a longer and more variable distance, mutations affecting 3′ss may have multiple effects that make it difficult to categorize cryptic or de novo sites unambiguously. For example, apart from a direct AG creation in the PPT, pyrimidine to purine substitutions may weaken the PPT of authentic 3′ss and thus contribute to their repression. Similarly, mutations of the 3′ YAG (such as IVS-1G>A, Supplementary Table 4) may potentially improve a match to the BPS consensus and promote cryptic 3′ss activation in the downstream exon if a suitable YAG is available within an optimal distance from the newly created BPS. We cannot entirely exclude that this can contribute to the observed bias (Figure 1A and C); nevertheless, it is likely that their distribution can be fully explained by a lack of AGs and pyrimidine residues upstream and downstream of authentic 3′ss, respectively. Although the number of reported cryptic 3′ss in introns was low (n = 18), their proportion in terminal introns (33%) (7,18,72–74) appeared to be higher than in other categories of aberrant 3′ss (Table 1), pointing to possible involvement of 3′ end processing factors. Two de novo 3′ss observed in terminal exons were due to a transversion (36) and transition (37) in position –6 of the new intron creating uridine residues that improve PPT recognition. Purines are greatly underrepresented in this PPT position in several species (42) and both transitions and transversions of uridine –6 have been shown to increase exon skipping and/or retention of weakly spliced introns (15,46). These splicing defects are likely to result from diminished interaction of the PPT with the first RNA recognition motif of U2AF65 (75), which has been shown to promote 3′ end processing (76,77). The exact mechanism by which the 3′ss is recognized is not well understood. Although the first AG downstream from the BPS is usually selected for exon ligation, it is unclear why it is sometimes not the case and how exactly this is achieved. A ‘scanning mechanism’ of 3′ss selection (59,60) postulated a unidirectional, linear search that initiates at the BPS and continues until a suitable AG is selected. This model is consistent with blocking the second step on the hairpin structures inserted between BP and 3′AG (60,78). However, closely spaced, tandemly arranged AGs can efficiently compete when placed 13–22 nt downstream of the Saccharomyces cerevisiae branchpoint (61) and 15–23 nt in a metazoan substrate (20), although not when placed at least 35 nt from a mammalian BP (78). Frequent cryptic 3′ss in exons would support the scanning model (79), but our finding that cryptic 3′ss cluster in ∼20 nt region just downstream of authentic 3′ss (Figure 1A) provides no support per se for this concept since their underrepresentation in introns can be explained by AG depletion in the PPT. Several distant exonic cryptic 3′ss, such as NF1 E28+293 (293 nt from the start of exon 28) activated by the IVS27b-2A>T mutation (80) or MANBA E16+172 induced by the IVS15-2A>G transition (81), had a strong 3′ss upstream of mutated 3′AGs, but they were not used despite being much closer to the BPS than exonic cryptic 3′ss. These cases raise a speculation that exons could first be scanned for downstream AGs to completion before AGs upstream of authentic 3′ss are considered. The concomitant activation of cryptic 3′ss (Figure 3B and C) or 5′ ss (82) in both the intron and the exon indicates that inactivation of genuine splice sites eventually triggers a search for the most suitable splice sites in both directions, consistent with a general finding that more distant AGs compete less efficiently (Figure 1). Using well-documented cases of AG-creating mutations in the PPT that repress authentic 3′ss and activate aberrant 3′ss upstream of BPS, we have illustrated the importance of the ‘BPS-new AG’ distance and/or the strength of the new PPT for the expression of correctly spliced mRNAs. Interestingly, this distance (Figure 3A) was similar to that reported previously for AGs that were outcompeted by a downstream AG (62,83) and is consistent with steric interference with a factor(s) bound to this region. Newly created AGs are likely to recruit spliceosome components that compete for interactions with authentic 3′ss, are partially recognized by essential splicing factors, such as U2 snRNP (84), but may not assemble fully functional splicing complexes (83,85). Utilization of such competing AGs has been shown to be affected by hSlu7 (83) and SPF45 (86). Future studies should examine interactions of human homologues of yeast Prp18 and Prp22 with the reporter pre-mRNAs described here, since these proteins are required only as the BP-AG distance increases (87,88). Unexpected promotion of upstream 3′ss observed for the LIPC -14G/-13G construct (Figure 4B, lane 6) might be addressed by examining contacts between the first nucleotide of the new exon and U2AF35 (89), U5 snRNA (90) or PRP8 (91). As with cryptic 5′ ss (11), the average S&S scores of cryptic 3′ss were significantly lower than for corresponding authentic 3′ss and higher than for mutated sites (Table 1). This indicates that intrinsic differences in the consensus 3′ss sequences contribute to the cryptic 3′ss remaining completely silent in the presence of a wild-type authentic site. How such cryptic 3′ss are efficiently inhibited is poorly understood, but nucleotides that precede AGs and similar AG-AG measuring mechanisms may also play an important and more general role in auxiliary splicing sequences (63). In contrast to cryptic 3′ss, the S&S scores of de novo sites were similar to authentic sites (Table 1), suggesting that they lack a predictive value in these cases. To explore the significance of S&S scores in selection of cryptic 3′ss in exon versus intron, we paired S&S scores of intronic cryptic 3′ss with the best match to the 3′ss in adjacent exons (Supplementary Table 5), and vice versa (Supplementary Table 6). Each intronic cryptic 3′ss had a higher S&S score than the best corresponding exonic YAG (Supplementary Table 5). The average difference between intronic cryptic 3′ss and the best match to 3′ss in exons was high (14.0; S&S = 83.8 versus S&S = 69.8; where i stands for intron and be is the best S&S score in adjacent downstream exons). Several exons lacked the YAG consensus altogether, such as exon 10 of TP53 (92), APOE exon 4 (73) and ELN exon 16 (93), or were very short, such as 27 nt exon 32 of COL17A1 (17) or 57 nt exon 14 of COL5A1 (94), and likely to be poorly defined (95). In contrast, as much as ∼45% of the putative 3′ ss in an adjacent intronic sequence had S&S scores higher than the corresponding cryptic 3′ss in exons (average S&S = 73.8 versus S&S = 71.1; where e is exon and bi are the best scores in a 100 nt of the upstream intron), with the average difference only 2.7. The high differential between the S&S scores for intronic cryptic 3′ss and their putative exon counterparts, which apparently reflects a lack of Y-runs in exons, should be considered when using this and other algorithms for prediction of competing 3′ss. In summary, our results improve prediction of aberrant 3′ss localization in human disease genes and provide a valuable resource for studying the mechanisms of 3′ss selection. They also suggest that inspection of genetic alterations in or near the 3′ss in their sequence context may in some cases facilitate prediction of their splicing and phenotypic outcomes.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.
  181 in total

1.  Identification and analysis of mutations in the Wilson disease gene (ATP7B): population frequencies, genotype-phenotype correlation, and functional analyses.

Authors:  A B Shah; I Chernov; H T Zhang; B M Ross; K Das; S Lutsenko; E Parano; L Pavone; O Evgrafov; I A Ivanova-Smolenskaya; G Annerén; K Westermark; F H Urrutia; G K Penchaszadeh; I Sternlieb; I H Scheinberg; T C Gilliam; K Petrukhin
Journal:  Am J Hum Genet       Date:  1997-08       Impact factor: 11.025

2.  Identification of four novel mutations in the factor VIII gene: three missense mutations (E1875G, G2088S, I2185T) and a 2-bp deletion (1780delTC).

Authors:  K Tavassoli; A Eigel; B Dworniczak; E Valtseva; J Horst
Journal:  Hum Mutat       Date:  1998       Impact factor: 4.878

3.  Mutational analysis of ectopic factor VIII transcripts from hemophilia A patients: identification of cryptic splice site, exon skipping and novel point mutations.

Authors:  K Tavassoli; A Eigel; H Pollmann; J Horst
Journal:  Hum Genet       Date:  1997-10       Impact factor: 4.132

4.  Molecular analysis of the androgen-receptor gene in a family with receptor-positive partial androgen insensitivity: an unusual type of intronic mutation.

Authors:  H T Brüggenwirth; A L Boehmer; S Ramnarain; M C Verleun-Mooijman; D P Satijn; J Trapman; J A Grootegoed; A O Brinkmann
Journal:  Am J Hum Genet       Date:  1997-11       Impact factor: 11.025

5.  Human beta-mannosidase cDNA characterization and first identification of a mutation associated with human beta-mannosidosis.

Authors:  A H Alkhayat; S A Kraemer; J R Leipprandt; M Macek; W J Kleijer; K H Friderici
Journal:  Hum Mol Genet       Date:  1998-01       Impact factor: 6.150

6.  Two mutations produce intron insertion in mRNA and elongated beta-subunit of human beta-hexosaminidase.

Authors:  B Dlott; A d'Azzo; D V Quon; E F Neufeld
Journal:  J Biol Chem       Date:  1990-10-15       Impact factor: 5.157

7.  Confirmation of prenatal diagnosis results of X-linked recessive myotubular myopathy by mutational screening, and description of three new mutations in the MTM1 gene.

Authors:  S M Tanner; J Laporte; C Guiraud-Chaumeil; S Liechti-Gallati
Journal:  Hum Mutat       Date:  1998       Impact factor: 4.878

8.  Analysis of the COL1A1 and COL1A2 genes by PCR amplification and scanning by conformation-sensitive gel electrophoresis identifies only COL1A1 mutations in 15 patients with osteogenesis imperfecta type I: identification of common sequences of null-allele mutations.

Authors:  J Körkkö; L Ala-Kokko; A De Paepe; L Nuytinck; J Earley; D J Prockop
Journal:  Am J Hum Genet       Date:  1998-01       Impact factor: 11.025

9.  Identification of intronic point mutations as an alternative mechanism for p53 inactivation in lung cancer.

Authors:  T Takahashi; D D'Amico; I Chiba; D L Buchhagen; J D Minna
Journal:  J Clin Invest       Date:  1990-07       Impact factor: 14.808

10.  A presumed DNA helicase encoded by ERCC-3 is involved in the human repair disorders xeroderma pigmentosum and Cockayne's syndrome.

Authors:  G Weeda; R C van Ham; W Vermeulen; D Bootsma; A J van der Eb; J H Hoeijmakers
Journal:  Cell       Date:  1990-08-24       Impact factor: 41.582

View more
  24 in total

1.  U7 snRNA-mediated correction of aberrant splicing caused by activation of cryptic splice sites.

Authors:  Hideki Uchikawa; Katsunori Fujii; Yoichi Kohno; Noriyuki Katsumata; Kazuaki Nagao; Masao Yamada; Toshiyuki Miyashita
Journal:  J Hum Genet       Date:  2007-09-13       Impact factor: 3.172

2.  Predicting gene structure changes resulting from genetic variants via exon definition features.

Authors:  William H Majoros; Carson Holt; Michael S Campbell; Doreen Ware; Mark Yandell; Timothy E Reddy
Journal:  Bioinformatics       Date:  2018-11-01       Impact factor: 6.937

3.  Identification of a MIP mutation that activates a cryptic acceptor splice site in the 3' untranslated region.

Authors:  Chongfei Jin; Jin Jiang; Wei Wang; Ke Yao
Journal:  Mol Vis       Date:  2010-11-02       Impact factor: 2.367

4.  A single-base substitution within an intronic repetitive element causes dominant retinitis pigmentosa with reduced penetrance.

Authors:  Thomas Rio Frio; Terri L McGee; Nicholas M Wade; Christian Iseli; Jacques S Beckmann; Eliot L Berson; Carlo Rivolta
Journal:  Hum Mutat       Date:  2009-09       Impact factor: 4.878

5.  Compensatory signals associated with the activation of human GC 5' splice sites.

Authors:  Jana Kralovicova; Gyulin Hwang; A Charlotta Asplund; Alexander Churbanov; C I Edvard Smith; Igor Vorechovsky
Journal:  Nucleic Acids Res       Date:  2011-05-23       Impact factor: 16.971

Review 6.  Defective splicing, disease and therapy: searching for master checkpoints in exon definition.

Authors:  Emanuele Buratti; Marco Baralle; Francisco E Baralle
Journal:  Nucleic Acids Res       Date:  2006-07-19       Impact factor: 16.971

7.  Aberrant 3' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization.

Authors:  Igor Vorechovský
Journal:  Nucleic Acids Res       Date:  2006-09-08       Impact factor: 16.971

8.  Dual effect of a single nucleotide polymorphism in the first intron of the porcine secreted phosphoprotein 1 gene: allele-specific binding of C/EBP beta and activation of aberrant splicing.

Authors:  Eduard Muráni; Siriluck Ponsuksili; Hans-Martin Seyfert; Xuanming Shi; Klaus Wimmers
Journal:  BMC Mol Biol       Date:  2009-10-21       Impact factor: 2.946

9.  Control of pre-mRNA splicing by the general splicing factors PUF60 and U2AF(65).

Authors:  Michelle L Hastings; Eric Allemand; Dominik M Duelli; Michael P Myers; Adrian R Krainer
Journal:  PLoS One       Date:  2007-06-20       Impact factor: 3.240

10.  Aberrant 5' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization.

Authors:  Emanuele Buratti; Martin Chivers; Jana Královicová; Maurizio Romano; Marco Baralle; Adrian R Krainer; Igor Vorechovsky
Journal:  Nucleic Acids Res       Date:  2007-06-18       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.