| Literature DB >> 32211018 |
Eliseos J Mucaki1, Ben C Shirley2, Peter K Rogan1,2,3,4.
Abstract
Splice isoform structure and abundance can be affected by either noncoding or masquerading coding variants that alter the structure or abundance of transcripts. When these variants are common in the population, these nonconstitutive transcripts are sufficiently frequent so as to resemble naturally occurring, alternative mRNA splicing. Prediction of the effects of such variants has been shown to be accurate using information theory-based methods. Single nucleotide polymorphisms (SNPs) predicted to significantly alter natural and/or cryptic splice site strength were shown to affect gene expression. Splicing changes for known SNP genotypes were confirmed in HapMap lymphoblastoid cell lines with gene expression microarrays and custom designed q-RT-PCR or TaqMan assays. The majority of these SNPs (15 of 22) as well as an independent set of 24 variants were then subjected to RNAseq analysis using the ValidSpliceMut web beacon (http://validsplicemut.cytognomix.com), which is based on data from the Cancer Genome Atlas and International Cancer Genome Consortium. SNPs from different genes analyzed with gene expression microarray and q-RT-PCR exhibited significant changes in affected splice site use. Thirteen SNPs directly affected exon inclusion and 10 altered cryptic site use. Homozygous SNP genotypes resulting in stronger splice sites exhibited higher levels of processed mRNA than alleles associated with weaker sites. Four SNPs exhibited variable expression among individuals with the same genotypes, masking statistically significant expression differences between alleles. Genome-wide information theory and expression analyses (RNAseq) in tumor exomes and genomes confirmed splicing effects for 7 of the HapMap SNP and 14 SNPs identified from tumor genomes. q-RT-PCR resolved rare splice isoforms with read abundance too low for statistical significance in ValidSpliceMut. Nevertheless, the web-beacon provides evidence of unanticipated splicing outcomes, for example, intron retention due to compromised recognition of constitutive splice sites. Thus, ValidSpliceMut and q-RT-PCR represent complementary resources for identification of allele-specific, alternative splicing.Entities:
Keywords: allele-specific gene expression; alternative splicing; cryptic splicing; information theory; intron retention; mRNA splicing; mutation; single nucleotide polymorphism
Year: 2020 PMID: 32211018 PMCID: PMC7066660 DOI: 10.3389/fgene.2020.00109
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Splicing Impact of rs1805377 (XRCC4). The natural acceptor of XRCC4 exon 8 is abolished by rs1805377 (11.5 −> 0.6 bits) while simultaneously strengthening a second exonic cryptic acceptor 6nt downstream (11.4 to 11.8 bits), resulting in a 6nt deletion in the mRNA. (A) Both of these acceptor sites have been validated in GenBank mRNAs, i.e., NM_022406 and NM_003401 (UCSC panel derived from http://genome.ucsc.edu). (B) The relative abundance of the two splice forms was determined by q-RT-PCR. The weaker rs1805377 A/A genotype (0.6 bit acceptor) was used ~47-fold less frequently than the cryptic downstream acceptor (11.8 bits). (C) The two splice isoforms cannot be distinguished by the exon microarray as the upstream probeset (ID 2818500) does not overlap the variable region, though the average expression of the rs1805377 A/A genotype is reduced. (D) ValidSpliceMut flagged this mutation for intron retention, which can be observed in the RNAseq of heterozygous ICGC patient DO27779 [Box 1]. Use of both acceptor sites is also evident [Box 2]. For more detail for this and all of the other single nucleotide polymorphisms (SNPs) analyzed, refer to .
Figure 2Splicing Impact of rs2070573 (C21orf2). (A) The single nucleotide polymorphisms (SNP) rs2070573 is a common polymorphism which alters the first nucleotide of the extended form of C21orf2 exon 6. (B) The donor site is strengthened by the presence of the C-allele (R 0.4 to 4.0 bits; A > C) and its use extends the exon by 360 nt. Q-RT-PCR found a ~4-9-fold and ~17-23-fold increase in the extended exon 6 splice form in the A/C and C/C cell lines tested, respectively. (C) The exon microarray probeset which detects the extension (ID 3934488) shows a stepwise increase in SI with C-allele individuals which supports the q-RT-PCR result. (D) The variant was present in ValidSpliceMut, which associated the A-allele with an increase in total intron retention [six patients flagged for total intron retention read abundance; p=0.019 (average over all patients)]. This image displays sequence read distributions in the RNAseq data of TCGA BRCA patient, TCGA-BH-A0H0, who is heterozygous for rs2070573. The IGV panel indicates reads corresponding to total intron 6 retention [Box 1] and which extend beyond the constitutive donor splice site of exon 6 into the adjacent intron [Box 2]. All 4 reads which extend over the exon splice junction are derived from the G-allele (strong binding site; not all visible in panel D).
Figure 3Splicing Impact of rs1333973 (IFI44L). (A) The natural donor of IFI44L exon 2 is weakened from 9.1 to 4.6 bits (A > T) by rs1333973, increasing the frequency of exon 2 skipping and other events. (B) By q-RT-PCR, skipping was found to be 15.6-fold higher while normal splicing was 15.4-fold lower in A homozygotes (relative to T homozygotes). (C) Exon microarray data strongly supports the q-RT-PCR findings. (D) ICGC patient DO6354 is homozygous for the T-allele, which resulted in exon 2 skipping [Box 3], failure to recognize the exon 2 donor causing total retention of intron 2 [Box 1], and activation of an upstream exonic 2.4 bit cryptic donor 375 nt from the affected site [Box 2].
Summary of q-RT-PCR results.
| Summary of q-RT-PCR results | SNP effects (Increase/Decrease in fold change of homozygotes) | Additional Expression Evidence | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gene |
| Splice type | Information Change (n–natural, c–cryptic site) | Fold change | Natural site | Cryptic site | Exon skipping | Alternate exon | Total mRNA | In-framea | Exon microarray | RNAseq (Valid-SpliceMut) |
| S1.6/ rs1805377 / | A | 11.5 (G) -> 0.6 (A) (n) | 1886 | 38.4 | – | – | – | n/ab | Y | N | Y | |
| A | 11.4 (G) -> 11.8 (A) (n) | 1.3 | – | 47.3c | – | – | n/ab | Y | N | Y | ||
| S1.4/ rs2243187 / NM_013371: c.364-1G>A | A | 4.7 (G) -> -6.2 (A) (n) | 26.3 | 1.8d | 1.8d | 2.1d | – | n/ab | Y | Y | N | |
| S1.1/ rs2070573 / NM_004928: c.643-137A>C | D | 0.4 (A) -> 4.0 (C) (c) | 12.1 | NC | 22.6 | – | – | n/ab | Y | Y | Y | |
| S1.21/ rs2835655 / NM_003316: c.5115G>A | D | 11.2 (G) -> 8.2 (A) (n) | 8.1 | 1.5 | – | 1.3e | – | 1.5 | – | N | N | |
| S1.5/ rs2835585 / NM_003316: c.188-8T>A | A | 7.6 (T) -> 5.4 (A) (n) | 4.5 | NC | – | 8.8 | – | n/ab | Y | N | Y | |
| S1.13/ rs17002806 / NM_152613: c.*86G>A | D | 9.2 (G) -> 6.0 (A) (n) | 9.1 | 23.8 | 34+g | – | – | n/ab | N | Y | N | |
| S1.8/ rs3747107 / NR_024448: n.2961+6512 C>G | A | 4.7 (C) -> -7.0 (G) (n) | 25.9 | 98.3 | 31/42.8h | – | 1.6 | n/ab | Y/Ni | N | N | |
| S1.9/ rs2266988k / NM_006115: c.19G>A | D | 7.8 (G) -> 6.2 (A) (n) | 3.0 | NC | – | 8.8 | – | n/ab | N | N | N | |
| S1.14/ rs2072049 / NM_006115: c.954-16G>T | A | 8.2 (G) -> 7.1 (T) (n) | 2.2 | 2.6d | – | – | – | 3.1d | - | Y | Y | |
| S1.10/ rs1893592 / | D | 8.7 (A) -> 4.2 (C) (n) | 22.9 | 3.0 | 2.0/1.7l | N/Am | – | 1.4 | Y | N | Y | |
| S1.15/ rs6003906 / NM_198440: c.328-8A>T | A | –2.1 (A) -> -4.3 (T) (n) | 4.6 | NC | 2.0 | – | – | n/ab | N | Y | N | |
| S1.3/ rs1018448 / NM_014570: c.1065C>A | A | 10.5 (C) -> 8.3 (A) (n) | 4.5 | 2.2 | – | 1.4 | – | 2.0 | Y | Y | N | |
| S1.2/ rs10190751 / NM_003879: c.606+934G>A NM_001127184:c.607-1G>A | A | 16.1 (G) -> 5.2 (A) (n) | 1885 | 10000+ | – | – | 2.1 | n/ab | Y | Y | Y | |
| S1.11/ rs1333973 / NM_006820: c.478+3A>T | D | 9.1 (A) -> 4.6 (T) (n) | 22.6 | 15.4 | – | 15.6 | – | 3.9 | N | Y | N | |
| S1.12/ rs13076750 / NM_001167671:c.-66-8G>A | A | 9.3 (G) -> -1.6 (A) (n) | 625 | 5000+ | 15.8 | 26.6 | – | – | Y | N | Y | |
| S1.7/ rs743920 / NM_133455: c.326C>G | A | 6.0 (C) -> 7.9 (G) (c) | 3.6 | n/ab | 5.8 | – | – | n/ab | – | Y | N | |
| S1.22/ rs16994182 / NM_001146078:c.-82+4C>G | D | 7.4 (C) -> 6.8 (G) (n) | 1.6 | 5.9 | – | 2.1 | – | n/ab | Y | N/A | N | |
| S1.17/ rs16802 / NM_004327:c.2708-13A>G | A | 5.6 (A) -> 5.8 (G) (n) | 1.1 | NCn | – | – | – | n/ab | – | N | N | |
| S1.19 / rs8130564 / NM_032404: c.66-13T>C | A | 4.2 (T) -> 4.4 (C) (n) | 1.1 | NCn | – | – | – | n/ab | – | N | N | |
| S1.18/ rs2252576 / NM_012105: c.748-10C>T | A | 7.2 (C) -> 8.0 (T) (n) | 1.5 | NCn | – | – | – | n/ab | – | N | N | |
| S1.20/ rs2285141 / NM_007326: c.-48-18G>T | A | 2.0 (G) -> 1.2 (T) (n) | 1.7 | NC | – | – | 1.8 | n/ab | – | N | N | |
| S1.16/ rs2838010 / NM_058186: c.20-217A>T | D | –10.8 (A) -> 7.8 (T) (c) | 228 | – | – | – | N/Ai | n/ab | Y | N | N | |
Red text indicates a decrease in the abundance of a particular splice form, while green text indicates an increase in abundance. A – Acceptor Splice Site Affected; D – Donor Splice Site Affected; NC - Not detectable (abolished). a Splicing events which alter reading frame may induce nonsense-mediated decay; b No allele specific difference in expression and splicing; c complete discrimination of both isoforms using a custom designed TaqMan probe; d Values from comparing heterozygote with homozygote common; e Change in splicing likely related to change in RNA level; f Intron 2-3 retention of TTC3 amplified by PCR, but no allele specific change detected; g This splice form not at detectable levels in homozygote; h Cryptic acceptor 114nt upstream of affected site / cryptic acceptor 118nt upstream of affected site; i mRNA in-frame when alternate exon is used, and out of frame due to cryptic site use; j PRAME is a special case where two SNPs affect splicing of two separate exons; k rs2266988 and rs1129172 are identical SNPs on opposite strands; l Cryptic donor 555nt downstream of affected site / cryptic donor 29nt downstream of affected site; m Splice form not detected by PCR; n High variability between individuals with the same genotype by q-RT-PCR.
Abundance of mRNA splice forms relative to internal gene reference.
| Gene | rsID | mRNA Splice Form | Homozygotes strong | Homozygotes weak |
|---|---|---|---|---|
| rs2070573 | Extended Exon 6 Splice Form | 45.3 ± 16.9 [2]1 | 1.5 ± 0.2 [2] | |
| rs2835585 | Exon 3 Skipping | <0.1 ± 0.0 [2] | 0.3 ± 0.1 [2] | |
| rs2243187 | 3 nt Inclusion of Exon 5 | ~100 [1] | 53.2 ± 3.0 [1; Het.] | |
| rs2243187 | 3 nt Exclusion of Exon 5 | 61.0 ± 3.4 [1] | ~100 [1; Het.] | |
| rs2243187 | Exon 5 Skipping | 14.1 ± 1.4 [1] | 6.8 ± 1.1 [1; Het.] | |
| rs1893592 | 29 nt Retention of Intron 10 | 8.6 ± 4.6 [1] | 4.2 ± 5.8 [1] | |
| rs1805377 | 6 nt Inclusion of Exon 8 | ~100 [1] | 3.1 ± 0.5 [1] | |
| rs1805377 | 6 nt Exclusion of Exon 8 | 2.6 ± 1.1 [1] | 61.4 ± 26.3 [1] | |
| rs2266988 | Normal Exon 3 Splicing | 32.4 ± 3.6 [1] | 20.1 ± 3.9 [1] | |
| rs2266988 | Exon 3 Skipping | 1.2 ± 1.2 [1] | 3.5 ± 3.0 [1] | |
| rs3747107 | Exon 8 Splicing | 34.0 ± 0.6 [1] | 0.3 ± 0.0 [2] | |
| rs3747107 | Alternative Exon 8 | 37.3 ± 7.3 [1] | 39.8 ± 3.7 [2] | |
| rs3747107 | 114 nt Retention of Intron 7 | 0.1 ± 0.0 [1] | 1.4 ± 0.1 [2] | |
| rs3747107 | 118 nt Retention of Intron 7 | 0.2 ± 0.0 [1] | 8.0 ± 0.5 [2] | |
| rs6003906 | Normal Exon 5 Splicing | ~100 [1] | 55.8 ± 0.0 [1] | |
| rs6003906 | Extended Exon 4; Short Exon 5 | 3.2 ± 0.0 [1] | 4.0 ± 0.0 [1] | |
| rs1018448 | Exon 12 Skipping | 12.3 ± 5.7 [3] | 23.1 ± 10.1 [1] | |
| rs1333973 | Normal Exon 2 Splicing | 57.0 ± 0.0 [1] | 14.7 ± 0.0 [1] | |
| rs1333973 | Exon 2 Skipping | 0.8 ± 0.0 [1] | 48.0 ± 0.0 [1] | |
| rs10190751 | Upstream Exon 7 Use | ~100 [1] | <0.1 ± 0.0 [1] | |
| rs10190751 | Downstream Exon 7 Use | 39.0 ± 0.0 [1] | 87.1 ± 0.0 [1] | |
| rs17002806 | 25 nt Intron 6 Retention | N.D.2 | 2.3 ± 0.0 [1] | |
| rs2285141 | Alternate Exon 2 Use | <0.1 ± 0.0 [1] | <0.1 ± 0.0 [1] | |
| rs743920 | 6 nt deletion of Exon 4 | 69.0 ± 9.7 [1] | 9.2 ± 1.4 [1] | |
| rs16994182 | Exon 2 Skipping | 1.9 ± 0.0 [1] | 5.3 ± 0.0 [1] |
1Average expression was computed by comparing qPCR Ct values across multiple experimental runs and normalized against Ct of internal gene reference. SNPs tested in multiple experiments with one individual of each genotype will have a standard deviation of 0.0. 2Heterozygote; Individuals who are homozygous for IL19 SNP rs2243187 were not available for testing. N.D., Not detected. Ct values were not available for LPP rs13076750.
RNAseq analysis of natural splice sites weakened by common single nucleotide polymorphisms (SNPs).
| Gene | rsID1 | HGVS Notation | Alternative Splicing Observed | |||
|---|---|---|---|---|---|---|
|
|
| 6.1 | 4.5 | −1.6 | Intron Retention; Cryptic Site Use | |
|
| 5:64890479A > C | 10.0 | 7.5 | −2.5 | Intron Retention | |
|
| 5:79773028T > G | 14.1 | 11.8 | −2.4 | Intron Retention | |
|
|
| 2.5 | 1.4 | −1.1 | Intron Retention; Cryptic Site Use | |
|
|
| 6.9 | 4.4 | −2.5 | Exon Skipping | |
|
|
| 10.4 | 7.1 | −3.3 | Intron Retention | |
|
|
| 15.6 | 14.5 | −1.1 | Intron Retention | |
|
|
| 9.6 | 7.0 | −2.6 | Intron Retention | |
|
|
| 7.5 | 4.9 | −2.5 | Exon Skipping | |
|
|
| 4.0 | 2.7 | −1.4 | Intron Retention | |
|
|
| 6.8 | 5.0 | −1.7 | Intron Retention; Exon Skipping | |
|
|
| 11.4 | 9.9 | −1.4 | Intron Retention | |
|
| 6:31506648G > T | 4.8 | 3.7 | −1.1 | Intron Retention; Cryptic Site Use | |
|
|
| 5.9 | 4.4 | −1.5 | Intron Retention | |
|
| 5:102537200T > G | 12.5 | 11.3 | −1.3 | Wildtype Only | |
|
| 1:59131311G > T | 10.4 | 8.7 | −1.7 | Wildtype Only | |
|
|
| 6.2 | 3.6 | −2.6 | Wildtype Only | |
|
|
| 10.2 | 8.9 | −1.3 | Wildtype Only | |
|
| 4:76722353G > A | 11.8 | 8.8 | −3.0 | Wildtype Only | |
|
|
| 14.1 | 13.0 | −1.1 | Wildtype Only | |
|
|
| 3.7 | 2.4 | −1.3 | Wildtype Only | |
|
|
| 4.6 | 3.4 | −1.1 | Wildtype Only | |
|
|
| 5.3 | 4.2 | −1.1 | Wildtype Only | |
|
|
| 6.9 | 4.7 | −2.2 | Wildtype Only |
1rsIDs are hyperlinked to their associated dbSNP page; 2If present, variant coordinates are hyperlinked to the ValidSpliceMut database; Thick bars separate SNP-affected exons with and without RNAseq-observed alternate splicing events.