| Literature DB >> 26545919 |
XianMing Wu1, Laurence D Hurst2.
Abstract
Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3-69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20-45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3' non-"AGgt" splice site. We suggest the concept of the "fragile" exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density.Entities:
Keywords: exonic splicing enhancer; pathogenic SNPs; splice site; splicing cis-motif
Mesh:
Substances:
Year: 2015 PMID: 26545919 PMCID: PMC4866546 DOI: 10.1093/molbev/msv251
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Cis-Motif Usage Correlates Significantly with Proportion of Phase Zero Splice Sites.
| All Exons (AA) | All Exons (codon) | Random 5,000 Exons (AA) | Random 5,000 exons (codon) | |
|---|---|---|---|---|
| 140.6781 | 103.9811 | 82.6049 | 114.0758 | |
| < 0 | <0 | <0 | <0 |
Note.—Y, proportion of amino acids/codons showing significant trends; Pphase-zero, proportion of intercodon splice site.
aLog BF (log Bayes factor) = 2*(log [harmonic mean (complex model)]−log [harmonic mean (simple model)]), All Log BF values in the table are >10, so the evidences of all correlations are very strong.
bBayesTraits parameter “R Trait 1 2” can indicate whether the correlation is positive (>0) or negative (<0).
FPathogenic SNPs are enriched close to exon junctions. (a) Of 8,250 pathogenic SNPs in internal exons, the great majority (76.62%) are within 3–69 bp from the exon ends. We consider enrichment of SNPs in three domains: 1) splice sites (≤3 bp) are greatly enriched for pathogenic SNPs (Observed: 5.49%, Expected: 3.37%); 2) Pathogenic SNPs have significant preference at exon terminal domains (3–69 bp, Observed: 76.62%, Expected: 64.26%). 3) Distribution of pathogenic SNPs in exon cores are relatively underrepresented (Observed: 17.89%, Expected: 32.38%). (χ2 = 841.64, df = 2, P < 1.74 × 10−183). (b) The same pattern of significant deviation in pathogenic SNPs distribution are observed for: 1) 5′-half of internal exons (χ2 = 410.62, df = 2, P < 6.85 × 10−90); 2) 3′-half of internal exons (χ2 = 551.74, df = 2, P < 1.55 × 10−120). (c) Distribution of nonsense pathogenic SNPs in internal exons are similar to that of non-nonsense mutations (χ2 = 455.37, df = 2, P < 1.31 × 10−99).
Evidence for Correlation between Proportion of Phase Zero Splice Sites and Splice-Related Genomic Traits.
| 58.5625 | 45.2272 | 26.6799 | |
| >0 | <0 | <0 |
Note.—X, mean CDS length/gene length; N, introns per kb exon; M, mean intron size; Pphase-zero, Proportion of phase zero splice site.
aLog BF (log Bayes factor) = 2*(log [harmonic mean (complex model)]-log [harmonic mean (simple model)]), All Log BF values in the table are >10, so the evidences of all correlations are very strong.
bBayesTraits parameter “R Trait 1 2” can indicate whether the correlation is positive (>0) or negative (<0).