| Literature DB >> 16141195 |
Jana Královicová1, Mikkel B Christensen, Igor Vorechovský.
Abstract
We compiled sequences of previously published aberrant 3' splice sites (3'ss) that were generated by mutations in human disease genes. Cryptic 3'ss, defined here as those resulting from a mutation of the 3'YAG consensus, were more frequent in exons than in introns. They clustered in approximately 20 nt region adjacent to authentic 3'ss, suggesting that their under-representation in introns is due to a depletion of AG dinucleotides in the polypyrimidine tract (PPT). In contrast, most aberrant 3'ss that were induced by mutations outside the 3'YAG consensus (designated 'de novo') were in introns. The activation of intronic de novo 3'ss was largely due to AG-creating mutations in the PPT. In contrast, exonic de novo 3'ss were more often induced by mutations improving the PPT, branchpoint sequence (BPS) or distant auxiliary signals, rather than by direct AG creation. The Shapiro-Senapathy matrix scores had a good prognostic value for cryptic, but not de novo 3'ss. Finally, AG-creating mutations in the PPT that produced aberrant 3'ss upstream of the predicted BPS in vivo shared a similar 'BPS-new AG' distance. Reduction of this distance and/or the strength of the new AG PPT in splicing reporter pre-mRNAs improved utilization of authentic 3'ss, suggesting that AG-creating mutations that are located closer to the BPS and are preceded by weaker PPT may result in less severe splicing defects.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16141195 PMCID: PMC1197134 DOI: 10.1093/nar/gki811
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of aberrant 3′ splice sites in human genes
| Location of cryptic or | Exon | Intron | Total | ||
|---|---|---|---|---|---|
| Mutation | In 3′YAG (cryptic) | Outside 3′YAG (‘ | In 3′YAG (cryptic) | Outside 3′YAG (‘ | |
| Number of genes | 39 | 16 | 14 | 35 | 89 |
| Number of cryptic/ | 59 (74) | 21 (26) | 18 (27) | 49 (73) | 147 |
| Number of unique 3′ss (%) | 56 (75) | 19 (25) | 18 (27) | 48 (73) | 141 |
| Reading frame | |||||
| 0 | 18 | 7 | 5 | 19 | 49 (33.3%) |
| +1 | 23 | 5 | 8 | 15 | 51 (34.7%) |
| +2 | 18 | 9 | 5 | 15 | 47 (32.0%) |
| Average distance (nt) between authentic and cryptic 3′ss (SD) | 43.8 (46.4) | 57.9 (29.9) | −71.6 (120.3) | −19.1 (33.7) | 10.7 (67.2) |
| Median distance (nt) between authentic and cryptic 3′ss | 12 | 54 | −44 | −12 | 2 |
| Number of terminal exons (%) | 10 (17) | 2 (10) | 6 (33) | 3 (6) | 21 (14) |
| Average S&S score (SD) | |||||
| Authentic (A) | 85.3 (6.1) | 80.8 (10.0) | 87.6 (4.9) | 81.3 (6.6) | 83.6 (7.2) |
| Mutated (M) | 68.4 (8.0) | 80.2 (9.7) | 72.3 (5.3) | 78.8 (8.4) | 74.1 (9.5) |
| Cryptic/ | 74.2 (8.2) | 82.3 (7.6) | 83.8 (6.0) | 77.6 (7.9) | 77.7 (8.5) |
| Average difference | |||||
| A-M ( | 16.9 (2 × 10−17) | 0.9 (N.S.) | 15.1 (9 × 10−10) | 2.5 (N.S.) | 9.6 (10−18) |
| M-CR ( | 5.7 (3 × 10−4) | 2.0 (N.S.) | 11.4 (10−16) | −1.2 (N.S.) | −3.6 (6 × 10−4) |
| A-CR ( | 11.2 (10−18) | −1.2 (N.S.) | 3.8 (0.07) | 3.7 (0.04) | 6.0 (10−19) |
aMann–Whitney rank test (SPSS, SPSS Inc., USA).
bExcluding an outlier of 1165 nt (48).
SD, standard deviation; NS, not statistically significant.
Cryptic and de novo 3′ splice sites in introns
| Gene | Phenotype | Mutation | Location of cryptic 3′ss | Reference |
|---|---|---|---|---|
| APOE | ApoE deficiency | IVS3-2A>G | IVS3-52 | ( |
| AR | Androgen insensitivity | IVS2-11T>A | IVS2-69 | ( |
| ATM | Ataxia telangiectasia | IVS32-12A>G | IVS32-11 | ( |
| ATM | Ataxia telangiectasia | IVS16-10T>G | IVS16-9 | ( |
| ATP7B | Wilson disease | IVS11-2A>G | IVS11-39 | ( |
| BRCA1 | Breast cancer | IVS7-24del10 | IVS7-59 | ( |
| BRCA1 | Familial breast cancer | IVS5-12A>G | IVS5-11 | ( |
| CFTR | Cystic fibrosis | IVS17a-26A>G | IVS17a-25 | ( |
| CHIT1 | Chitotriosidase deficiency | E10+20dupl24 | E10+103 | ( |
| COL17A1 | Benign epidermolysis bullosa | IVS31-1G>T | IVS31-69, IVS31-264 | ( |
| COL5A1 | Ehlers-Danlos syndrome type II | IVS13-2A>G | IVS13-100 | ( |
| CPO | Hereditary coproporphyria | IVS1-15C>G | IVS1-14 | ( |
| CYBB | Chronic granulomatous disease | IVS4-15del36 | IVS4-179 | ( |
| CYP21B | 21-hydroxylase deficiency | IVS2-13C>G | IVS2-19, IVS2-33 | ( |
| DBT | Maple syrup urine disease | IVS4-17TTT>AAA | IVS4-17 | ( |
| DMD | Muscular dystrophy | IVS59-9T>A | IVS59-7 | ( |
| DMD | Muscular dystrophy | IVS8-15A>G | IVS8-14 | ( |
| ELN | Supravalvular aortic stenosis | IVS15-3C>G | IVS15-44 | ( |
| ERCC3 | Xeroderma pigmentosum | IVS14-6C>A | IVS14-4 | ( |
| FANCA | Fanconi anaemia | IVS15-1G>T | IVS15-90 | ( |
| GCH1 | Dystonia | IVS2-2A>G | IVS2-1 | ( |
| HBB | Beta-thalassaemia | IVS1-15T>G | IVS1-14 | ( |
| HBB | Beta-thalassaemia | IVS1-21G>A | IVS1-19 | ( |
| HBB | Beta-thalassaemia | IVS2-A>G | IVS2-271 | ( |
| HEXB | Sandhoff disease | IVS10-17A>G | IVS10-37 | ( |
| HEXB | Sandhoff disease | IVS12-26G>A | IVS12-24 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS8-3T>G | IVS8-2 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS8-16G>A | IVS8-14 | ( |
| ITGB2 | Leukocyte adhesion deficiency | IVS6-14C>A | IVS6-12 | ( |
| L1CAM | X-linked hydrocephalus | IVS17-19A>C | IVS-69 | ( |
| L1CAM | X-linked hydrocephalus, MASA syndrome, spastic paraplegia | IVS26-12G>A | IVS26-10 | ( |
| LIPC | Hepatic lipase deficiency | IVS1-14A>G | IVS1-78, IVS1-13 | ( |
| MECP2 | Rett syndrome | IVS1-6C>G | IVS1-5 | ( |
| MLYCD | Malonyl-CoA decarboxylase deficiency | IVS4-14A>G | IVS4-13 | ( |
| MPO | Myeloperoxidase deficiency | IVS11-2A>C | IVS11-109 | ( |
| MTM1 | X-linked recessive myotubular myopathy | IVS12-10A>G | IVS12-9 | ( |
| MYBPC3 | Hypertrophic cardiomyopathy | IVS14-13G>A | IVS14-11 | ( |
| NF1 | Neurofibromatosis type I | IVS15-16A>G | IVS15-15 | ( |
| NF1 | Neurofibromatosis type I | IVS15-15A>G | IVS15-14 | ( |
| NF1 | Neurofibromatosis type I | IVS10a-9T>A | IVS10a-7 | ( |
| NF1 | Neurofibromatosis type I | IVS39-12T>A | IVS39-10 | ( |
| NF1 | Neurofibromatosis type I | IVS26-2A>T | IVS26-14, IVS26-17 | ( |
| NF1 | Neurofibromatosis type I | IVS11-3C>G | IVS11-43 | ( |
| NF1 | Neurofibromatosis type I | IVS15-12T>G | IVS15-11 | ( |
| OCA2 | Type II oculocutaneous albinism | IVS5-19A>G | IVS5-18 | ( |
| PAH | Phenylketonuria | IVS8-7A>G | IVS8-6 | ( |
| PAH | Phenylketonuria | IVS10-11G>A | IVS10-9 | ( |
| RB1 | Retinoblastoma | IVS22-8T>A | IVS22-6 | ( |
| SAG(S-ARRESTIN) | Retinitis pigmentosa | IVS10-25A>G | IVS1-24 | ( |
| SALL1 | Townes-Brocks syndrome | IVS2-19T>A | IVS2-17 | ( |
| SERPINC1 | Type I antithrombin deficiency | IVS4-14G>A | IVS4-12 | ( |
| SOD1 | Amyotrophic lateral sclerosis | IVS4-10T>G | IVS4-9 | ( |
| TCF1 (HCF-1A) | Maturity-onset diabetes of the young | IVS4-2A>G | IVS4-202 | ( |
| TCF1 (HCF-1A) | Maturity-onset diabetes of the young | IVS7-6G>A | IVS7-4 | ( |
| TH | Extrapyramidal movement disorder | IVS11-24T>A | IVS11-36 | ( |
| TNFRSF1A | Periodic fever syndrome | IVS3-14G>A | IVS3-12 | ( |
| TP53 | Li-Fraumeni syndrome | IVS9-1G>C | IVS9-44 | ( |
| TP53 | Li-Fraumeni syndrome | IVS3-11C>G | IVS3-10 | ( |
| TPMT | Thiopurine methyltransferase deficiency | IVS9-1G>A | IVS9+1, IVS9-330 | ( |
| WRN | Werner's syndrome | IVS29-7T>A | IVS29-5 | ( |
| ZAP70 | Severe combined immunodeficiency | IVS9-11G>A | IVS9-9 | ( |
Figure 1Distribution of the distances between authentic and aberrant 3′ss. (A) Cryptic splice sites resulting from mutations at the 3′YAG consensus. (B) Cryptic splice sites resulting from mutations of the 3′YAG, except for cryptic 3′ss in exon positions 1–21. (C) Aberrant 3′ss due to mutations outside the 3′YAG (‘de novo’). The Stata statistical package (v. 8.2, StataCorp, TX) was used to fit kernel density plots to the distances between authentic and cryptic/de novo 3′ss. Positive and negative numbers correspond to aberrant 3′ss located in the downstream exon or the upstream intron, respectively. The number of occurrences of aberrant 3′ss is shown as short vertical bars for each distance (in nt). The corresponding scale is shown on the right side. A cryptic splice site that was found at a large distance from the authentic 3′ss (48) was omitted from the plot.
Figure 2Consensus sequences of aberrant 3′ss. (A) Cryptic splice acceptors in exons resulting from mutations of the 3′ YAG; (B) 3′ss in exons due to mutations outside the 3′ YAG; (C) cryptic 3′ss in introns that resulted from mutations of the 3′ YAG; (D) aberrant 3′ss located in introns generated by mutations outside the 3′ YAG. (E) Consensus sequences of corresponding authentic 3′ss (n = 147). The relative nucleotide frequencies at each position were plotted with a pictogram utility (). The height of each letter is proportional to the frequency of the corresponding base at the given position. Arrows indicate over-representation of adenine in position −3 (A) and of uridine in position +1 (D) of the new intron or exon, respectively.
Cryptic and de novo 3′ splice sites in exons
| Gene | Phenotype | Mutation | Location of cryptic 3′ss | Reference |
|---|---|---|---|---|
| ABCR (ABCA4) | Stargardt disease | E16+1G>C | E16+3 | ( |
| ACAT1 | Mitochondrial acetoacetyl-CoA thiolase deficiency | E5+46C>T | E5+51 | ( |
| ALG8 | Glycosylation deficiency | IVS1-2A>G | E2+11 | ( |
| ARSA | Metachromatic leukodystrophy | E8+22C>T | E8+27 | ( |
| ASS | Citrullinaemia | IVS14-1G>C | E15+7 | ( |
| ATM | Ataxia telangiectasia | IVS38-2A>C | E39+61 | ( |
| ATM | Ataxia telangiectasia | IVS64-1G>C | E65+13 | ( |
| BRCA2 | Breast cancer | IVS23-2A>G | E24+7 | ( |
| BTD | Biotinidase deficiency | E1+56G>A | E1+57 | ( |
| CBFA2 (RUNX1) | Familial thrombocytopenia | IVS20-1G>T | E21+13 | ( |
| CDKL5 | Rett syndrome | IVS13-1G>A | E14+1 | ( |
| CLN3 | Batten disease | IVS15-1G>T | E16+5 | ( |
| COH1 | Cohen syndrome | IVS51-1G>T | E52+16 | ( |
| COL17A1 | Epidermolysis bullosa | IVS31-1G>T | E32+9 | ( |
| COL17A1 | Epidermolysis bullosa simplex | IVS21-2A>C | E22+27 | ( |
| COL1A2 | Ehlers-Danlos syndrome type VII | IVS5-1G>C | E6+15 | ( |
| COL1A2 | Osteogenesis imperfecta | IVS27-2A>G | E28+46 | ( |
| COL2A1 | Stickler syndrome | IVS17-2A>G | E18+16 | ( |
| COL5A1 | Ehlers-Danlos syndrome type I | IVS4-2A>G | E5+12, E5+15 | ( |
| DAF | CD55 deficiency | E5+18C>T | E5+44 | ( |
| DMD | Dystrophinopathy | IVS20-2A>G | E21+7 | ( |
| DMD | Muscular dystrophy | IVS74-2A>G | E76+60 | ( |
| EPB42 | Recessive hereditary spherocytosis | E11+39G>T | E11+41 | ( |
| F8 | Haemophilia A | E11+32G>T | E11+36 | ( |
| F8 | Haemophilia A | E16+26G>A | E16+47 | ( |
| F8 | Haemophilia A | IVS15+1G>T | E16+47 | ( |
| FGB | Hypofibrinogenaemia | E4+115T>A | E4+116 | ( |
| G6PC | Glycogen storage disease type 1a | E5+86G>T | E5+91 | ( |
| G6PD | Glucose-6-phosphate dehydrogenase deficiency | IVS10-2A>G | E11+9 | ( |
| GH-1 | Growth hormone deficiency | IVS3del28-45 | E3+98 | ( |
| GLA | Fabry disease | IVS3-1G>A | E4+1 | ( |
| GLA | Fabry disease | IVS6-1G>A | E7+1 | ( |
| GPB | Henshaw antigen | E5+65C>G | E5+65 | ( |
| HEXB | Sandhoff disease | E11+8C>T | E11+112 | ( |
| HEXB | Sandhoff disease | IVS10-17A>G | E11+112 | ( |
| HLA-B*3916 | Deficient expression of | E3+17G>C | E3+19 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS1-2A>G | E2+5 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS5-1G>A | E6+1 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS7-1G>A | E8+21 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS9-1G>A | E10+17 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS9-2A>G | E10+17 | ( |
| HPRT1 | Hypoxanthine–guanine phosphoribosyltransferase deficiency | IVS9-2A>T | E10+17 | ( |
| INSR | Rabson-Mendenhall's syndrome | IVS4-2A>G | E5+12 | ( |
| ITGA2B | Glanzmann thrombasthenia | IVS3-3DEL13 | E4+18 | ( |
| ITGB4 | Epidermolysis bullosa | IVS31-19T>A | E32+38 | ( |
| KRT14 | Recessive epidermolysis bullosa simplex | IVS2-2A>C | E3+10 | ( |
| LAMA2 | Muscular dystrophy | IVS28-1G>C | E29+69 | ( |
| LAMC2 | Junctional epidermolysis bullosa | IVS3-1G>A | E3+2 | ( |
| LDLR | Familial hypercholesterolaemia | IVS1-1G>C | E2+10 | ( |
| LDLR | Familial hypercholesterolaemia | IVS7-1G>C | E8+17 | ( |
| LDLR | Familial hypercholesterolaemia | IVS9-1G>A | E10+7 | ( |
| LDLR | Familial hypercholesterolaemia | IVS9-30 GTGCTGATG>CGGCT | E10+54 | ( |
| LHX4 | Syndromic short stature | IVS4-1G>C | E5+12, E5+20 | ( |
| MANBA | Beta-mannosidosis | IVS15-2A>G | E16+172 | ( |
| NF1 | Neurofibromatosis type 1 | IVS27b-2A>T | E28+293 | ( |
| NIS (SLC5A5) | Congenital hypothyroidism | E13+67C>G | E13+67 | ( |
| OAS1 | Oligoadenylate synthase activity | IVS6-1A>G | E7+1, E7+137 | ( |
| OTC | Ornithine transcarbamylase deficiency | IVS4-2A>T | E5+12 | ( |
| PDE6B | Autosomal recessive retinitis pigmentosa | IVS2-1G>T | E3+12 | ( |
| PFKM | Muscle phosphofructokinase deficiency | IVS6-2A>C | E7+5, E7+12 | ( |
| PKLR | Pyruvate kinase deficiency | IVS3-2A>T | E4+6 | ( |
| PTPS | Pyruvoyl-tetrahydropterin synthase deficiency | IVS1-3C>G | E2+12 | ( |
| SCNN1G | Pseudohypoaldosteronism type 1 | IVS2-1G>A | E3+6 | ( |
| SPG4 | Spastic paraplegia | IVS6-1G>A | E7+8 | ( |
| SPINK5 | Netherton syndrome | IVS20-1G>A | E21+1 | ( |
| TNFSF5 (HIGM1) | X-linked hyper-IgM syndrome | IVS4-2A>G | E5+8 | ( |
| TP53 | Li-Fraumeni syndrome | IVS3-1G>A | E4+19 | ( |
| TP53 | Li-Fraumeni syndrome | IVS5-11DEL11 | E6+17 | ( |
| TP53 | Li-Fraumeni syndrome | IVS5-1G>A | E6+1 | ( |
| TP53 | Lung cancer | IVS3-1G>C | E4+19 | ( |
| TSC2 | Tuberous sclerosis | IVS38-18G>A | E39+74 | ( |
| TSC2 | Tuberous sclerosis | IVS9-15G>A | E10+56 | ( |
| TSC2 | Tuberous sclerosis | IVS9-3C>G | E10+56 | ( |
| UGT1A1 | Crigler-Najjar syndrome type 1 | IVS3-2A>G | E4+107 | ( |
| UGT1A1 | Crigler-Najjar syndrome type 1 | IVS4-1G>A | E5+7 | ( |
| XPA | Xeroderma pigmentosum group A | IVS3-1G>C | E4+2 | ( |
Figure 3AG-creating mutations in the PPT that activate aberrant 3′ss upstream of predicted BPS. (A) Aberrant 3′ss activated by newly created AGs (in italics) that repress (minus sign) authentic 3′ss in the PPT (upper panel). Distances between upstream 3′ss (U) and predicted BP, between newly created AG (mutated, M) and BP (arrow) and between the mutated and authentic (A) 3′ss are in base pairs, bp (lower panel). The S&S scores were computed for U, M and A 3′ss using an algorithm as described previously (12,13). BPS is shown as a black rectangle. Disease-causing mutations were LIPC IVS1-14A>G (52), HEXB IVS10-17A>G (19), AR IVS2-11T>A (23) and FBN2 IVS28-15A>G (53). Putative BPSs were GGCTAAG, GCCTAAT, TATCAAC and TGACAAT, respectively. (B) Schematic representation of minigene constructs. Exons are shown to scale (scale unit is 0.1 kb). The sizes of minigene introns (lines; not to scale) are shown below each construct. Intron truncations are indicated by a slash. Full LIPC introns 1 and 2 were 106.2 and 3.3 kb, respectively. Allele-specific DQB1 minigenes were described previously (15). (C) Splicing pattern of mutated minigenes after transfection into 293T cells. RT–PCR products amplified with vector-specific primers PL3 and PL4. Wild-type minigenes containing predicted BP adenine in the indicated positions were mutated to C, T and G. RNA species were confirmed by sequencing and are schematically shown on the right side and in (B). The first, second and third exons are shown as white, grey and black boxes, respectively. Introns are shown as lines. Thick lines indicate partial intron retention due to activation of aberrant splice sites. (D–F) Nucleotide sequence of RT–PCR products bridging aberrant 3′ss in mutated constructs. Exons (e) are indicated by grey rectangles, introns (IVS, intervening sequence) by a white rectangle. Aberrant 3′ss are designated by a distance from the corresponding authentic splice site. (G) Activation of aberrant splice sites is specific for AG dinucleotides. Mutations removing AG dinucleotides are indicated at the top and bottom of each panel. (H) AG dinucleotides within predicted BPS do not activate cryptic splice sites. Mutations creating AG dinucleotides in the predicted BPS are indicated at the top and bottom of each panel.
Figure 4A role for the BPS-new AG distance and/or the strength of the new PPT in upstream cryptic 3′ss activation. (A) Nucleotide sequences of splicing reporter constructs at the 3′ss are followed by the S&S matrix scores and by the percentage of splice site utilization of the indicated RNA products (means of duplicate transfections). Intronic sequences are in lower case, exonic sequences are in upper case. Putative BPs are shaded. U, aberrant 3′ss upstream of predicted BPS; M, newly created (preexisting in DQB1) or proximal AG between the authentic 3′ss and BP; A, authentic or distal 3′ss. ES, exon skipping; +112, % utilization of the splice site +112 by the HEXB pre-mRNAs site. The S&S scores were calculated according to the algorithm described previously (12,13). BPSs of LIPC exon 2 and of DQB1 exon 4 were predicted by a branch site tool (), with BPS scores 3.25 and 3.2, respectively. No BPS was predicted for HEXB. The HEXB IVS10-29A>G and IVS10-29A>T gave splicing patterns identical to the wild-type constructs (data not shown). RT–PCR products for the LIPC-14Y and HEXB-17Y mutations are shown in Figure 3C. (B–D) RNA products generated by wild-type and mutated constructs after transfection into 293T cells. The designation of splicing reporter constructs (top of each panel) corresponds to that in (A). RNA products were confirmed by sequencing and are schematically shown on the right side. (E) Nucleotide sequences of RT–PCR products illustrating aberrant splice sites in DQB1 reporters.
Figure 5Sequence context of intervening AG dinucleotides in authentic 3′ss. Relative frequencies of nucleotides that immediately precede (A) or follow (B) AGs located 2–30 nt downstream of predicted BP adenosines. Corresponding numbers of intervening AGs are shown as grey columns. Distances between the predicted BP adenosine and downstream AG are in nucleotides (nt) as follows: 2 nt (YNCURAA+1G+2; BP is underlined), 3 nt (YNCURAY+1A+2G+3), 4 nt (YNCURAY+1N+2A+3G+4), etc., up to 30.
Figure 6The influence of SR proteins on utilization of aberrant 3′ and 5′ ss in LIPC. (A) Wild-type and mutated splicing reporters were co-transfected with plasmids expressing the indicated SR proteins. Splicing reporters are shown at the top and SR proteins are indicated at the bottom. VO, vector only control, in which an empty pCG vector was co-transfected with the wild-type and mutated reporter constructs; NC, no co-transfection (reporter only) control. The corresponding LIPC isoforms are shown on the right side. (B) Nucleotide sequences surrounding aberrant splice sites induced by SR proteins. Exons (e) are shown as a grey rectangle, introns (IVS) are indicated by white rectangles.