| Literature DB >> 19339519 |
François-Olivier Desmet1, Dalil Hamroun, Marine Lalande, Gwenaëlle Collod-Béroud, Mireille Claustres, Christophe Béroud.
Abstract
Thousands of mutations are identified yearly. Although many directly affect protein expression, an increasing proportion of mutations is now believed to influence mRNA splicing. They mostly affect existing splice sites, but synonymous, non-synonymous or nonsense mutations can also create or disrupt splice sites or auxiliary cis-splicing sequences. To facilitate the analysis of the different mutations, we designed Human Splicing Finder (HSF), a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. It contains all available matrices for auxiliary sequence prediction as well as new ones for binding sites of the 9G8 and Tra2-beta Serine-Arginine proteins and the hnRNP A1 ribonucleoprotein. We also developed new Position Weight Matrices to assess the strength of 5' and 3' splice sites and branch points. We evaluated HSF efficiency using a set of 83 intronic and 35 exonic mutations known to result in splicing defects. We showed that the mutation effect was correctly predicted in almost all cases. HSF could thus represent a valuable resource for research, diagnostic and therapeutic (e.g. therapeutic exon skipping) purposes as well as for global studies, such as the GEN2PHEN European Project or the Human Variome Project.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19339519 PMCID: PMC2685110 DOI: 10.1093/nar/gkp215
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Branch point matrix. The size of each nucleotide is proportional to its weight in the position weight matrix. Nucleotides above the base line have positive values while nucleotides below have negative values.
Figure 2.New position weight matrices of recognition motifs for proteins involved in splicing. (A) hnRNP A1; (B) Tra2-β and (C) 9G8.
Branch point sequences
| Gene | Intron | References | Ref BP | Ref Seq | HSF BP | HSF value |
|---|---|---|---|---|---|---|
| 32 | ( | −27 | ENST00000355306 | −27 | 87.81 | |
| 31 | ( | −33 | ENST00000258104 | −33 | 93.13 | |
| 30 | ( | −24 | ENST00000262464 | −24 | 77.06 | |
| 3 | ( | −21 | ENST00000323322 | −26 | 73.36 | |
| 31 | ( | −17 | ENST00000200181 | −17 | 93.79 | |
| 4 | ( | −20 | ENST00000264005 | −20 | 95.07 | |
| 9 | ( | −25 | ENST00000252444 | −25 | 86.59 | |
| 6 | ( | −28 | ENST00000269228 | −28 | 77.41 | |
| 2 | ( | −25 | ENST00000268261 | −25 | 80.56 | |
| 7 | ( | −23 | ENST00000268261 | −23 | 72.27 | |
| 23 | ( | −26 | ENST00000267163 | −26 | 75.89 | |
| 11 | ( | −22 | ENST00000324155 | −22 | 84.96 | |
| 38 | ( | −18 | ENST00000219476 | −18 | 67.71 | |
| 3 | ( | −24 | ENST00000285021 | −24 | 82.78 |
For each gene the reference sequence from the Ensembl genome database (Ref Seq), the intron number (Intron) and the position of the BP identified by in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown.
Figure 3.Distribution of CVs for (A) 3′ and (B) 5′ natural splice sites (5′ss and 3′ss). Data extracted from the Ensembl dataset (release 44, http://april2007.archive.ensembl.org) (20) using the HSF algorithm.
Intronic mutations in FBN1 (ENST00000316623), FBN2 (ENST00000262464), RB1 (ENST00000267163), TGFBR2 (ENST00000295754), MLH1 (ENST00000231790) and MSH2 (ENST00000233146) that lead to splicing defects
| Gene | Mutation | References | WT CV | Mutant CV | CV variation (%) |
|---|---|---|---|---|---|
| Mutations causing exon skipping | |||||
| c.247 + 1G>A | ( | 82.26 | 55.42 | −32.62 | |
| c.538 + 1G>A | ( | 83.99 | 57.15 | −31.96 | |
| c.1468 + 5G>A | ( | 84.46 | 72.30 | −14.40 | |
| c.3208 + 5G>T | ( | 94.98 | 82.66 | −12.97 | |
| c.3838 + 1G>A | ( | 95.84 | 69.01 | −28.00 | |
| c.3839 − 1G>T | ( | 87.62 | 58.67 | −33.04 | |
| c.3964 + 1G>A | ( | 90.04 | 63.20 | −29.80 | |
| c.3965 − 2A>T | ( | 89.30 | 60.35 | −32.41 | |
| c.4459 + 1G>A | ( | 97.66 | 70.83 | −27.47 | |
| c.4943 − 1G>C | ( | 79.77 | 50.82 | −36.29 | |
| c.5788 + 5G>A | ( | 88.06 | 75.89 | −13.82a | |
| c.6163 + 2del6 | ( | 99.05 | 72.90 | −26.40 | |
| c.6496 + 2insTG | ( | 82.21 | 32.05 | −61.01 | |
| c.6616 + 1G>C | ( | 78.08 | 51.24 | −34.37 | |
| c.6997 + 1G>A | ( | 92.11 | 65.27 | −29.13 | |
| c.7205 − 2A>G | ( | 84.11 | 55.16 | −34.42 | |
| c.7330 + 1G>A | ( | 98.02 | 71.18 | −27.38 | |
| c.7331 − 2A>G | ( | 80.72 | 51.77 | −35.86 | |
| c.8051 + 1G>A | ( | 92.02 | 65.18 | −29.16 | |
| c.8051 + 5G>A | ( | 92.02 | 79.85 | −13.22 | |
| c.8052 − 2A>G | ( | 92.86 | 63.92 | −31.17 | |
| c.3472 + 2T>G | ( | 90.99 | 64.15 | −28.53 | |
| c.4099 + 1G>C | ( | 91.66 | 64.82 | −29.28 | |
| c.4222 + 5G>A | ( | 92.11 | 79.94 | −13.21 | |
| c.4346 − 2A>T | ( | 90.91 | 61.96 | −31.84 | |
| c.264 + 4delA | ( | 91.34 | 84.93 | −7.01 | |
| c.380 + 3A>C | ( | 95.10 | 78.82 | −17.12 | |
| c.607 + 1G>T | ( | 99.05 | 72.21 | −27.09 | |
| c.939 + 4A>G | ( | 83.75 | 75.41 | −9.96 | |
| c.1049 + 2delT | ( | 76.95 | 57.00 | −25.90 | |
| c.1215 + 1G>A | ( | 85.86 | 59.02 | −31.26 | |
| c.1389 + 1G>A | ( | 82.69 | 55.86 | −32.45 | |
| c.1389 + 4A>G | ( | 82.69 | 74.35 | −10.09 | |
| c.1389 + 5G>A | ( | 82.69 | 70.53 | −14.71 | |
| c.1422 − 2A>T | ( | 86.12 | 57.17 | −33.62 | |
| c.1422 − 1G>A | ( | 86.12 | 57.17 | −33.62 | |
| c.1498 + 5G>A | ( | 82.91 | 70.75 | −14.67 | |
| c.1960 + 1G>A | ( | 94.02 | 67.19 | −28.54 | |
| c.1960 + 1delG | ( | 94.02 | 49.62 | −47.22 | |
| c.2211 + 1G>T | ( | 89.90 | 63.06 | −29.86 | |
| c.2212 − 2A>G | ( | 89.09 | 60.15 | −32.48 | |
| c.2211 + 1G>C | ( | 89.90 | 63.06 | −29.86 | |
| c.2520 + 1G>A | ( | 92.22 | 65.39 | −29.10 | |
| c.2520 + 3del4 | ( | 92.22 | 72.26 | −21.64 | |
| c.2663 + 1G>A | ( | 88.37 | 61.54 | −30.36 | |
| c.306 + 4A>G | ( | 96.07 | 87.73 | −8.68 | |
| c.454 − 2A>G | ( | 93.59 | 64.64 | −28.80 | |
| c.790 + 1G>A | ( | 83.28 | 56.45 | −32.22 | |
| c.790 + 5G>T | ( | 83.28 | 70.97 | −14.79 | |
| c.791 − 5T>G | ( | 80.80 | 77.17 | −4.49 | |
| c.884 + 4A>G | ( | 85.75 | 77.41 | −9.73 | |
| c.366 + 1G>T | ( | 86.73 | 59.89 | −30.95 | |
| c.793 − 2A>C | ( | 83.98 | 55.04 | −34.46 | |
| c.942 + 3A>T | ( | 99.24 | 83.86 | −15.50 | |
| c.1276 + 2T>A | ( | 84.70 | 57.86 | −31.69 | |
| c.1386 + 1G>A | ( | 89.02 | 62.19 | −30.13 | |
| c.2634 + 5G>T | ( | 84.41 | 72.09 | −14.59 | |
| Mutations resulting in the usage of cryptic splice sites | |||||
| c.2293 + 2T>C | ( | 89.77 | 62.94 | −29.89 | |
| 67.21 | 5′ CS (51 nt upstream) | ||||
| c.3463 + 1G>A | ( | 91.34 | 64.50 | −29.38 | |
| 88.47 | 5′ CS (27 nt downstream) | ||||
| c.4747 + 5G>T | ( | 89.13 | 76.81 | −13.82 | |
| 79.06 | 5′ CS (48 nt upstream) | ||||
| c.5788 + 1G>A | ( | 88.06 | 61.22 | −30.48 | |
| 82.64 | 5′ CS (33 nt downstream) | ||||
| c.138 − 8T>G | ( | 81.62 | 79.69 | −2.36 | |
| 55.35 | 84.29 | 3′ CS (7 nt upstream) | |||
| c.501 − 1G>A | ( | 97.50 | 68.55 | −29.69 | |
| 54.82 | 83.77 | 3′ CS (1 nt downstream) | |||
| c.607 + 1delG | ( | 99.05 | 22.54 | −77.24 | |
| 42.51 | 88.47 | 5′ CS (1 nt upstream) | |||
| c.1815 − 2A>G | ( | 75.42 | 46.47 | −38.39 | |
| 81.84 | 3′ CS (19 nt downstream) | ||||
| c.2107 − 2A>G | ( | 80.73 | 51.78 | −35.86 | |
| 69.56 | 3′ CS (35 nt downstream) | ||||
| c.95 − 2A>G | ( | 91.77 | 62.82 | −31.55 | |
| 68.28 | 3′ CS (18 nt downstream) | ||||
| c.1397 − 2A>G | ( | 92.32 | 63.38 | −31.35 | |
| 84.32 | 3′ CS (30 nt upstream) | ||||
| c.1397 − 1G>A | ( | 92.32 | 63.38 | −31.35 | |
| 84.32 | 3′ CS (30 nt upstream) | ||||
CS: cryptic site (i.e. a new splice site is created by the mutation and is used instead of the regular site). Nucleotide numbering follows the reference cDNA sequence with +1 corresponding to the A of the ATG translation initiation codon.
aThe mutation induces exon skipping.
bA cryptic splice site not created by the mutation and used in vivo was correctly predicted by HSF.
cThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF.
Exonic mutations in DMD (ENST00000357033), MLH1 (ENST 00000231790), MSH2 (ENST00000233146) and RB1 (ENST00000267163) involved in splicing
| Gene | Mutation | Position | References | WT CV | Mutant CV | CV variation (%) |
|---|---|---|---|---|---|---|
| c.5985T>G | Deep exonic | ( | 46.65 | 75.59 | 3′ CS (63 nt downstream) | |
| c.677G>A | Last base | ( | 84.46 | 73.89 | −12.52 | |
| c.882C>T | Exonic | ( | 84.46 | 73.89 | −12.52 | |
| c.1037A>G | Penultimate base | ( | 93.04 | 88.19 | −5.22 5′ CS (upstream | |
| c.1038G>T | Last base | ( | 93.04 | 82.17 | −11.68 5′ CS (upstream | |
| c.1667G>T | Last base | ( | 85.85 | 74.99 | −12.66 5′ CS (88 nt downstream) | |
| c.1731G>A | Last base | ( | 93.27 | 82.69 | −11.34 | |
| c.1989G>T | Last base | ( | 93.22 | 82.35 | −11.66 | |
| c.1660A>T | Penultimate base | ( | 84.00 | 79.25 | −5.65 5′ CS (82 nt upstream) | |
| c.1759G>C | Last base | ( | 85.66 | 74.65 | −12.86 | |
| c.1915C>T | Deep exonic | ( | 62.19 | 89.02 | 5′ CS (92 nt upstream) | |
| c.658C>G | Deep exonic | ( | 58.66 | 85.49 | 5′ CS (61 nt upstream) | |
| c.939G>T | Last base | ( | 83.75 | 72.88 | −12.98 | |
| c.1960G>C | Last base | ( | 94.02 | 83.01 | −11.71 | |
| c.1960G>A | Last base | ( | 94.02 | 83.44 | −11.25 |
CS: cryptic site (i.e. a new splice site is created by the mutation and is used instead of the regular site). Nucleotide numbering follows the reference cDNA sequence with +1 corresponding to the A of the ATG translation initiation codon.
aThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF.
bThe mutation induces exon skipping.
cThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison.
Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation
| Gene | Mutation | Ref. | Motif | Ref Seq | HSF prediction |
|---|---|---|---|---|---|
| c.362C>T | ( | −ESE (SF2/ASF) | ENST00000370841 | −9G8 | |
| −SF2/ASF | |||||
| +EIE | |||||
| −SRp40 | |||||
| − EIE | |||||
| +IIE | |||||
| c.5080G>T | ( | ? | ENST00000357654 | −EIE | |
| +SRp55 | |||||
| −9G8 | |||||
| −SF2/ASF | |||||
| −IIE | |||||
| +IIE | |||||
| −ESS | |||||
| +hnRNPA1 | |||||
| c.8165C>G | ( | −ESE | ENST00000380152 | −SRp40 | |
| −ESE | |||||
| +ESE | |||||
| −SRp55 | |||||
| −SF2/ASF | |||||
| −EIE | |||||
| c.5081G>T | ( | ? | ENST00000380152 | +SC35 | |
| +SRp40 | |||||
| −ESEf,h × 2 (5080_5086) | |||||
| −9G8 | |||||
| −ESS | |||||
| c.4250T>A | ( | +ESS (hnRNPA1) | ENST00000357033 | +9G8 | |
| −EIE | |||||
| +ESE | |||||
| −IIE | |||||
| +hnRNPA1 | |||||
| c.544A>G | ( | ? | ENST00000231790 | +ESS | |
| 5′ss ΔCV = −6.30 | |||||
| c.793C>T | ( | ? | ENST00000231790 | +ESS | |
| c.794G>A | ( | ? | ENST00000231790 | −SRp40 | |
| −SC35 | |||||
| +ESS | |||||
| c.882C>T | ( | ? | ENST00000231790 | +SC35 | |
| −SRp55 | |||||
| c.988_990del | ( | ? | ENST00000231790 | +SF2/ASF | |
| −SRp55 | |||||
| +9G8 | |||||
| −ESS | |||||
| c.815C>T | ( | ? | ENST00000233146 | −SRp55 | |
| +ESS | |||||
| +ESS | |||||
| c.274_276del | ( | ? | ENST00000233146 | +SC35 | |
| +SRp40 | |||||
| −IIE | |||||
| c.2230C>T | ( | ? | ENST00000354729 | −SF2/ASF | |
| +ESS | |||||
| +IIE | |||||
| +ESS | |||||
| c.557A>T | ( | −ESE | ENST00000356175 | −SRp55 | |
| −ESE | |||||
| −EIE | |||||
| −9G8 | |||||
| +ESS | |||||
| c.910C>T | ( | −ESE | ENST00000356175 | −9G8 | |
| −EIE | |||||
| +ESE | |||||
| −ESE | |||||
| −ESS | |||||
| c.943C>T | ( | −ESE | ENST00000356175 | −SC35 | |
| −SF2/ASF | |||||
| −PESE | |||||
| −9G8 | |||||
| +hnRNPA1 | |||||
| +IIE | |||||
| c.1007G>A | ( | −ESE | ENST00000356175 | +PESE | |
| −EIE | |||||
| +9G8 | |||||
| +ESE | |||||
| −ESS | |||||
| −IIE | |||||
| +hnRNPA1 | |||||
| c.5719G>T | ( | −ESE | ENST00000356175 | −ESE | |
| −EIE | |||||
| −ESS | |||||
| +PESS | |||||
| +hnRNPA1 | |||||
| c.6792C>A | ( | −ESE | ENST00000356175 | +ESE | |
| −EIE | |||||
| +Tra2−β | |||||
| −ESS | |||||
| c.6792C>G | ( | −ESE | ENST00000356175 | +ESE | |
| −EIE | |||||
| s+Tra2−β | |||||
| −ESS | |||||
| +hnRNPA1 |
+: a new site was created by the mutation; −: the motif was abolished by the mutation. Algorithms and matrices used to identify the motifs were:
aSilencer motifs from Sironi et al. (31).
bPESS octamers (28).
cIIEs (30).
dhnRNP motifs from HSF.
eESE Finder matrices (19).
fRESCUE ESE hexamers (63).
gPESE octamers (28).
hEIEs (30).
iESE motifs from HSF. When multiple adjacent sites were predicted, the number of sites is indicated: ×5 means that five adjacent sites were modified by the mutation. Nucleotide numbering reflects the reference cDNA sequence with +1 corresponding to the A of the ATG translation initiation codon.