| Literature DB >> 16963498 |
Abstract
The frequency distribution of mutation-induced aberrant 3' splice sites (3'ss) in exons and introns is more complex than for 5' splice sites, largely owing to sequence constraints upstream of intron/exon boundaries. As a result, prediction of their localization remains a challenging task. Here, nucleotide sequences of previously reported 218 aberrant 3'ss activated by disease-causing mutations in 131 human genes were compared with their authentic counterparts using currently available splice site prediction tools. Each tested algorithm distinguished authentic 3'ss from cryptic sites more effectively than from de novo sites. The best discrimination between aberrant and authentic 3'ss was achieved by the maximum entropy model. Almost one half of aberrant 3'ss was activated by AG-creating mutations and approximately 95% of the newly created AGs were selected in vivo. The overall nucleotide structure upstream of aberrant 3'ss was characterized by higher purine content than for authentic sites, particularly in position -3, that may be compensated by more stringent requirements for positive and negative nucleotide signatures centred around position -11. A newly developed online database of aberrant 3'ss will facilitate identification of splicing mutations in a gene or phenotype of interest and future optimization of splice site prediction tools.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16963498 PMCID: PMC1636351 DOI: 10.1093/nar/gkl535
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of aberrant 3′ss
| Location of cryptic or | |||||
|---|---|---|---|---|---|
| Exon | Intron | Both | |||
| Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG (‘ | Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG (‘ | All mutations | |
| Number of genes | 54 | 25 | 23 | 56 | 131 |
| Number of cryptic and | 88 (39) | 32 (14) | 29 (13) | 78 (34) | 227 (100) |
| Number of unique 3′ss (per cent) | 83 (38) | 29 (13) | 28 (13) | 78 (36) | 218 (100) |
| Number of aberrant 3′ss affecting terminal exons (per cent) | 11 (13) | 4 (14) | 8 (29) | 4 (5) | 27 (12) |
| Median distance (nucleotide) between authentic and aberrant 3′ss | 12 | 55 | −44 | −14 | 1 |
| Change in the reading frame for unique aberrant 3′ss | |||||
| 0 | 29 | 10 | 8 | 27 | 74 |
| +1 | 38 | 7 | 13 | 26 | 84 |
| +2 | 21 | 15 | 8 | 25 | 69 |
Summary of mutations leading to aberrant 3′ss
| Location of cryptic or | |||||
|---|---|---|---|---|---|
| Exon | Intron | Both | |||
| Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG (‘ | Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG (‘ | All mutations | |
| Number of deletions/insertions | 7 | 3 | 0 | 6 | 16 |
| Number of single-nucleotide substitutions | 81 | 29 | 29 | 72 | 211 |
| Wild-type nucleotide | |||||
| A | 36 | 3 | 13 | 29 | 81 |
| C | 2 | 9 | 5 | 9 | 25 |
| G | 43 | 13 | 10 | 18 | 84 |
| T | 0 | 4 | 1 | 16 | 21 |
| Mutated nucleotide | |||||
| A | 23 | 8 | 5 | 28 | 64 |
| C | 21 | 3 | 5 | 2 | 31 |
| G | 25 | 9 | 14 | 41 | 89 |
| T | 12 | 9 | 5 | 1 | 27 |
| Number of AG-creating mutations | |||||
| Total number (%) | 16 (7) | 12 (5) | 8 (4) | 62 (27) | 98 (43) |
| Not used as aberrant 3′ss (%) | 0 | 5 | 2 | 4 | 11 (5) |
Number of single-nucleotide substitutions in 3′YAG that resulted in cryptic 3′ss
| Observed | Expected | ||||
|---|---|---|---|---|---|
| Location of cryptic 3′ splice site | Exon | Intron | Both | Mono-a | Di-b |
| Point mutations in position IVS-1 | 43 | 10 | 53 | — | — |
| −1G>A | 23 | 4 | 27 | 25.2c | 31.5d |
| −1G>C | 14 | 3 | 17 | 10.9 | 13.2 |
| −1G>T | 6 | 3 | 9 | 16.9 | 8.3 |
| Point mutations in position IVS-2 | 36 | 12 | 48 | — | — |
| −2A>C | 7 | 2 | 9 | 9.3e | 8.8f |
| −2A>G | 23 | 8 | 31 | 30.6 | 35.5 |
| −2A>T | 6 | 2 | 8 | 8.1 | 3.7 |
| Point mutations in position IVS-3 | 2 | 7 | 9 | — | — |
| −3C>N | 2 | 5 | 7 | — | — |
| −3T>N | 0 | 1 | 1 | — | — |
| −3A>N | 0 | 1 | 1 | — | — |
Expected numbers of both exonic and intronic cryptic 3′ss were calculated as a weighted average of relative mono-a (47) and di-b (48) nucleotide mutability rates in the sense and antisense DNA strands that were published previously for a large number of point mutations in human disease genes. Relative substitution rates at the di-nucleotide level allow for the nearest-neighbour effects as previously described (48). cχ2=6.2, P = 0.046, dχ2=1.3, P > 0.05, eχ2=0.02, P > 0.05; fχ2=4.4, P > 0.05.
Figure 1Frequency distribution of 184 intronic point mutations that activated aberrant 3′ss
Comparison of the strength of authentic, mutated and aberrant 3′ss
| Location of aberrant 3′ss | |||||
|---|---|---|---|---|---|
| Exon | Intron | Both | |||
| Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG | Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG | All mutations | |
| Shapiro and Senapathy matrix score | |||||
| A (SD) | 84.5. (6.5) | 79.3 (9.8) | 85.0 (6.1) | 81.0 (7.8) | 82.6 (7.7) |
| M (SD) | 67.5 (8.0) | 78.9 (9.4) | 70.2 (6.6) | 79.4 (8.7) | 73.5 (10.0) |
| CR (SD) | 74.5 (9.2) | 78.6 (10.4) | 82.7 (8.5) | 77.3 (9.0) | 77.1 (9.5) |
| A–M | 17.1 | 0.5 | 14.8 | 1.6 | 9.2 |
| M–CR | 7.1 | −0.3 | 12.4 | −2.1 | 3.6 |
| A–CR | 10.0 | 0.7 | 2.4 | 3.6 | 5.5 |
| Maximum entropy model | |||||
| A (SD) | 8.7 (3.5) | 6.7 (4.0) | 8.6 (3.0) | 7.3 (2.9) | 7.9 (3.4) |
| M (SD) | −0.3 (4.5) | 5.8 (5.1) | −0.4 (3.6) | 3.8 (4.1) | 2.0 (5.0) |
| CR (SD) | 4.1 (3.9) | 5.5 (5.6) | 6.4 (3.6) | 5.2 (4.5) | 5.0 (4.4) |
| A–M | 8.4 | 0.5 | 9.0 | 3.2 | 5.6 |
| M–CR | 4.1 | −0.4 | 6.8 | 1.8 | 3.0 |
| A–CR | 4.4 | 0.9 | 2.2 | 1.4 | 2.6 |
| First-order Markov model | |||||
| A (SD) | 9.1 (3.0) | 7.1 (4.1) | 8.9 (2.8) | 7.5 (3.0) | 8.2 (3.2) |
| M (SD) | 0.1 (3.9) | 6.2 (5.1) | 0.3 (3.0) | 4.0 (4.1) | 2.3 (4.7) |
| CR (SD) | 4.4 (3.9) | 5.3 (5.9) | 7.1 (3.2) | 5.4 (4.8) | 5.2 (4.6) |
| A–M | 8.9 | 0.8 | 8.5 | 3.5 | 5.9 |
| M–CR | 4.3 | −0.9 | 6.7 | 1.4 | 2.9 |
| A–CR | 4.7 | 1.7 | 1.8 | 2.1 | 3.0 |
| Weight matrix model | |||||
| A (SD) | 9.9 (3.8) | 7.3 (4.3) | 9.6 (3.0) | 7.5 (4.1) | 8.7 (4.0) |
| M (SD) | 1.1 (4.3) | 7.1 (4.2) | 1.4 (3.1) | 6.6 (4.0) | 3.9 (4.9) |
| CR (SD) | 4.6 (4.8) | 6.2 (5.3) | 8.1 (3.5) | 6.2 (4.8) | 5.8 (4.8) |
| A–M | 8.8 | 0.3 | 8.2 | 0.9 | 4.8 |
| M–CR | 3.6 | −0.9 | 6.8 | −0.4 | 2.0 |
| A–CR | 5.3 | 1.2 | 1.5 | 1.3 | 2.8 |
| Information content | |||||
| A (SD) | 9.9 (3.9) | 8.1 (3.9) | 9.6 (3.1) | 7.9 (3.9) | 8.9 (3.9) |
| M (SD) | 2.1 (3.8) | 7.6 (3.8) | 2.4 (3.3) | 7.0 (3.7) | 4.6 (4.5) |
| CR (SD) | 5.9 (3.7) | 7.2 (3.3) | 8.0 (3.5) | 7.5 (3.6) | 6.9 (3.7) |
| A–M | 7.8 | 0.5 | 7.2 | 0.8 | 4.3 |
| M–CR | 3.5 | −0.7 | 5.5 | 0.1 | 2.2 |
| A–CR | 4.1 | 1.0 | 1.7 | 0.7 | 2.2 |
Means and SD of splice prediction scores (S&S, ME, MM, weight matrix model) or bits (information content) for authentic (A), mutated authentic (M) and aberrant (CR) 3′ss. The length of input sequences was 15 (−14 to +1 relative to authentic 3′ss), 23 (−20 to +3), 23 (−20 to +3), 23 (−20 to +3) and 28 (−26 to +2) nt, respectively. The information content algorithm failed to recognize 7 authentic, 12 mutated and 28 aberrant 3′ss used in vivo. The missing values (47/654, 7%) were treated as a group mean.
Discrimination of computational tools between authentic, mutated and cryptic/de novo 3′ss
| Location of aberrant 3′splice sites | |||||
|---|---|---|---|---|---|
| Exon | Intron | Both | |||
| Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG (‘ | Mutation in 3′YAG (cryptic) | Mutation outside 3′YAG (‘ | All mutations | |
| Shapiro and Senapathy matrix score | |||||
| A–M | 0.4 | 0.16 | |||
| CR–M | 0.5 | 0.09 | |||
| A–CR | 0.4 | 0.13 | |||
| Maximum entropy model | |||||
| A–M | 0.3 | ||||
| CR–M | 0.4 | ||||
| A–CR | 0.2 | ||||
| First-order Markov model | |||||
| A–M | 0.3 | ||||
| CR–M | 0.2 | ||||
| A–CR | 0.1 | ||||
| Weight matrix model | |||||
| A–M | 0.4 | 0.09 | |||
| CR–M | 0.2 | 0.4 | |||
| A–CR | 0.2 | 0.07 | |||
| Information contents | |||||
| A–M | 0.3 | 0.08 | |||
| CR–M | 0.3 | 0.2 | |||
| A–CR | 0.2 | 0.2 | |||
Table cells contain P-values of Wilcoxon Mann–Whitney rank tests comparing authentic (A), mutated (M) and cryptic/de novo (CR) 3′ss. P-values < 0.05 are in bold.