| Literature DB >> 28405812 |
Rosina Savisaar1, Laurence D Hurst2.
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1-4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.Entities:
Keywords: Exon Inclusion; Exonic Splice; Motif Density; Splice Assay; Synonymous Site
Mesh:
Substances:
Year: 2017 PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
Overview of experimental studies on the prevalence of exonic splice information
| References | Exon (size) | Variants tested | Proportion of splice-associated sites | Proportion of splice-disrupting variants | Definition of splice alteration | Examples of diseases associated to gene |
|---|---|---|---|---|---|---|
| Pagani et al. ( |
| Variants previously reported in patients and artificial variants | 21/29 (~72.4%) (includes some indels and multiple mutations) | 32/47 (~68.1%) (includes some indels and multiple mutations) | Undefined | Cystic fibrosis (Cheng et al. |
|
| CFTR |
|
|
|
|
|
| Tournier et al. ( |
| Variants (of unknown significance or deleterious) from Lynch syndrome families | 13/67 (~19.4%) (includes short indels) | 13/67 (~19.4%) (includes short indels) | Determined using a | Lynch syndrome (Bonadona et al. |
| Thery et al. ( |
| Variants of unknown significance from families undergoing genetic counselling | 6/30 (20.0%) | 6/30 (20.0%) | Undefined | Breast cancer (Antoniou et al. |
| Gaildrat et al. ( |
| Variants of unknown significance from families undergoing genetic counselling | 6/8 (75.0%) | 6/8 (75.0%) | Undefined | Breast cancer (Antoniou et al. |
| Di Giacomo et al. ( |
| Variants reported in breast and ovarian cancer patients | 7/23 (~30.4%) (includes small indels) | 8/26 (~30.8%) (includes small indels) | Undefined | See above |
| Kergourlay et al. ( |
| Missense mutations reported as disease-causing | 5/24 (~20.8%) | 5/25 (20.0%) | Undefined | Muscular dystrophy (Bashir et al. |
|
| SMN1 |
|
|
|
|
|
| Soukarieh et al. ( |
| All reported single-base substitutions (most from cancer patients) | 13/18 (~72.2%) | 17/22 (~77.3%) | PSI more than a single standard deviation removed from that observed in wild-type (standard deviation from three replicates) | See above |
|
| FAS |
|
|
|
|
|
| Tajnik et al. ( |
| Haemophilia B associated single-base substitutions, selected either because their disease-causing mechanism was unclear or because they were located in a region thought to contain splice enhancer elements | 6/9 (~66.7%) | 9/17 (~52.9%) | Undefined | Haemophilia B (Bolton-Maggs and Pasi |
The column entitled proportion of splice-disrupting variants reports the fraction of tested variants that were classed as splice-altering. The column proportion of splice-associated sites contains the fraction of tested sites in the exon where any splice-altering variants were detected. Unless otherwise noted, only single-base substitutions are considered. The column definition of splice alteration details the criteria used in the study for classifying a variant as splice-altering. Only exonic variants are considered
Italicized rows correspond to studies classed here as belonging to the second subtype (studies that chose the variants to test in an unbiased manner)
Fig. 1Percentage of splice-altering variants among variants tested (blue bars) or over-all percentage decrease in d (synonymous rate of evolution)/d 4 (fourfold degenerate rate of evolution) attributed to the need to preserve splice control elements (orange bars). The light blue bars correspond to subtype 1 (at least some variants chosen because of disease association) and the dark blue bars to subtype 2 (largely unbiased selection of variants). There is a large discrepancy between blue (experimental) and orange (computational) bars. Note, however, that the figures are directly comparable only if one assumes that the selection detected in the computational studies is strong enough to preclude all substitutions at selected sites (see “It is uncertain how to infer the density of selected sites from the decrease in ”). Note also that the estimate from Savisaar and Hurst (2017) reflected selection on non-splice related RNA-binding protein target motifs as well
Overview of computational studies on the evolutionary impact of exonic splice regulatory information
| References | Motif density |
| Over-all | Motifs | Control |
|---|---|---|---|---|---|
| Parmley et al. ( | ~30.42% | ~8.19% (including CpG sites)/11.03% (excluding CpG sites) (alignment to mouse) | ~2.49% (including CpG sites)/3.36% (excluding CpG sites) | 238 RESCUE-ESE ESE hexamers (Fairbrother et al. | Non-ESE sites |
| Cáceres and Hurst ( | 13.1–32.7% (exon ends only) | 8.5–17.1% (exon ends only, alignment to mouse) | 1.2–4% (extrapolated from exon ends to the full sequence) | Various sets of putative ESEs, formed by taking intersections of pre-existing sets | Either all non-ESE sites near exon ends or sites overlapping with nucleotide-matched control motifs |
| Savisaar and Hurst ( | ~57.3% | ~4.1% (alignment to macaque) | ~2.4% | 1483 motifs experimentally determined to be recognized by human RBPs | Sites that overlap dinucleotide-matched control motifs |
For Cáceres and Hurst (2013), the figures are presented as a range, as they depend on the set of motifs and the method of control used. Note that some studies considered d (rate of evolution at synonymous sites) while others considered d 4 (rate of evolution at fourfold degenerate sites). Parmley et al. (2006) also provided a second estimate for the over-all decrease in d (~8%), however, only the lower estimate is reproduced here because of concerns that the reasoning used to derive the higher value may have been circular
ESE exonic splice enhancer, RBP RNA-binding protein
Fig. 2The distribution of exon lengths in the human genome is shown in orange (see Online Resource 1 for data). The dashed line marks the median of this distribution. The asterisks mark the natural logs of the lengths of the exons used in the experimental studies (studies that used more than one exon have been excluded). Note that the majority of these values are below the genomic median, and the three subtype 2 studies (dark blue) correspond to particularly low figures
Fig. 3Two models for the distribution of functional splice regulatory information along the exon. Under the first model, functional splice regulatory elements are rare but under strong purifying selection. Under the second model, functional splice regulatory elements are frequent but only weakly constrained. Intermediate scenarios are also possible