| Literature DB >> 20011102 |
Jasmin Coulombe-Huntington1, Kevin C L Lam, Christel Dias, Jacek Majewski.
Abstract
Recently, thanks to the increasing throughput of new technologies, we have begun to explore the full extent of alternative pre-mRNA splicing (AS) in the human transcriptome. This is unveiling a vast layer of complexity in isoform-level expression differences between individuals. We used previously published splicing sensitive microarray data from lymphoblastoid cell lines to conduct an in-depth analysis on splicing efficiency of known and predicted exons. By combining publicly available AS annotation with a novel algorithm designed to search for AS, we show that many real AS events can be detected within the usually unexploited, speculative majority of the array and at significance levels much below standard multiple-testing thresholds, demonstrating that the extent of cis-regulated differential splicing between individuals is potentially far greater than previously reported. Specifically, many genes show subtle but significant genetically controlled differences in splice-site usage. PCR validation shows that 42 out of 58 (72%) candidate gene regions undergo detectable AS, amounting to the largest scale validation of isoform eQTLs to date. Targeted sequencing revealed a likely causative SNP in most validated cases. In all 17 incidences where a SNP affected a splice-site region, in silico splice-site strength modeling correctly predicted the direction of the micro-array and PCR results. In 13 other cases, we identified likely causative SNPs disrupting predicted splicing enhancers. Using Fst and REHH analysis, we uncovered significant evidence that 2 putative causative SNPs have undergone recent positive selection. We verified the effect of five SNPs using in vivo minigene assays. This study shows that splicing differences between individuals, including quantitative differences in isoform ratios, are frequent in human populations and that causative SNPs can be identified using in silico predictions. Several cases affected disease-relevant genes and it is likely some of these differences are involved in phenotypic diversity and susceptibility to complex diseases.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20011102 PMCID: PMC2780703 DOI: 10.1371/journal.pgen.1000766
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Validating isoform eQTLs from automated capillary electrophoresis of RT–PCR products.
(A,B) Example capillary electrophoresis fluorescence readings for the AMACR gene for individuals with AA and GG genotypes for SNP rs3195676. (C) Estimated isoform ratios of the AMACR gene for each individual, as a function of the SNP Genotype. See Methods for details.
RT–PCR validated AS candidates.
| Gene | Probeset ID | Gene region | AS Type | Most likely Causative SNP | Array regression P-value | PCR validation regression p-val |
| C14orf129 | 3550335 | 5′ UTR | Alternative SS | rs2053588 | 3.40E-06 | 1.90E-09 |
| C8orf59 | 3142947 | 5′ UTR | Cassette exon | chr8:86318715 | 3.10E-27 | 1.30E-06 |
| AMACR | 2852757 | Coding | Cassette exon | rs10941112 | 2.40E-08 | 2.70E-06 |
| USP8 | 3593685 | Coding | Cassette exon | rs4775889 | 3.40E-06 | 7.10E-06 |
| PLD2 | 3707250 | Coding | Alternative SS | rs3764897 | 1.80E-06 | 5.30E-06 |
| ZNF419 | 3843285 | Coding | Cassette exon | rs11672136 | 3.00E-07 | 7.10E-06 |
| UEVLD | 3365492 | Coding | Cassette exon | rs56151250 | 2.70E-08 | 2.30E-05 |
| SMAD5 | 2830027 | 5′ UTR | Cassette exon | unknown | 2.00E-05 | 1.60E-04 |
| TMEM77 | 2427753 | 5′ UTR | Cassette exon | rs3762374 | 2.70E-15 | 1.70E-04 |
| SERGEF | 3365169 | Coding | Cassette exon | rs211146 | 1.20E-10 | 1.90E-04 |
| MMAB | 3470844 | Coding | Cassette exon | rs2287180 | 4.20E-08 | 8.00E-04 |
| SNORD49 | 3712109 | Coding | Cassette exon | unknown | 5.70E-07 | 9.50E-04 |
| RNASEN | 2852054 | Coding | Cassette exon | rs55656741 | 4.50E-06 | 4.30E-03 |
| DUSP18 | 3957502 | 5′ UTR | Cassette exon | rs5753268 | 1.70E-10 | 6.30E-03 |
| IFI44L | 2343481 | Coding | Cassette exon | rs1333973 | 1.40E-06 | 7.80E-03 |
| SH3YL1 | 2537134 | Coding | Cassette exon | rs62114506 | 1.10E-09 | 1.10E-02 |
| SLC3A2 | 3333716 | Coding | Cassette exon | unknown | 1.40E-11 | 1.10E-02 |
| WDR67 | 3114099 | Coding | Cassette exon | rs6984928 | 1.30E-08 | 1.40E-02 |
| WARS | 3579582 | 5′ UTR | Cassette exon | rs941928 | 1.10E-13 | 1.50E-02 |
| TMEM149 | 3859924 | Coding | Alternative SS | rs17638853 | 7.90E-07 | 1.60E-02 |
| DMKN | 3859789 | Coding | Cassette exon | rs4254439 | 2.00E-06 | 2.20E-02 |
| RBCK1 | 3873192 | Coding | Cassette exon | rs41281892 | 1.00E-07 | 2.50E-02 |
| ESPL1 | 3415861 | Coding | Alternative SS | rs6580942 | 3.10E-09 | correct trend |
| RGL3 | 3850922 | Coding | Cassette exon | unknown | 2.80E-06 | correct trend |
| XPNPEP3 | 3946515 | Coding | Cassette exon | unknown | 7.80E-09 | correct trend |
| SGOL1 | 2665585 | Coding | Cassette exon | rs61729306 | 1.70E-10 | correct trend |
| GTF3C2 | 2545738 | 5′ UTR | Cassette exon | unknown | 3.10E-07 | correct trend |
| PPIL2 | 3938300 | 3′ UTR | retained intron | rs12484060 | 1.00E-10 | correct trend |
| ACP1 | 2466156 | Coding | Cassette exon | rs11553746 | 2.70E-12 | correct trend |
| MGC16169 | 2780811 | Coding | Cassette exon | rs12639869 | 2.90E-17 | correct trend |
| CCDC41 | 3466174 | 5′ UTR | Cassette exon | chr12:93353207 | 1.70E-05 | correct trend |
| DDX19A/B | 3667169 | Coding | Cassette exon | unknown | 7.30E-04 | correct trend |
| HNRPH1 | 2890160 | Coding | retained intron | rs34734159 | 4.10E-13 | variable ratios |
| BCKDHA | 3834195 | Coding | Alternative SS | rs12602 | 9.20E-05 | variable ratios |
| FAM64A | 3707965 | 3′ UTR | retained intron | rs7218283 | 3.40E-13 | variable ratios |
| ZNF83 | 3869658 | Coding | retained intron | rs7248435 | 2.70E-10 | variable ratios |
| IKIP | 3467329 | Coding | Cassette exon | unknown | 2.40E-05 | variable ratios |
| SIDT1 | 2636499 | Coding | Cassette exon | rs2271494 | 6.70E-04 | variable ratios |
| IL6 | 2992594 | Coding | Cassette exon | rs2069832 | 8.10E-05 | variable ratios |
| VISA | 3874507 | Coding | Alternative SS | rs17857295 | 1.10E-12 | isoforms detected |
| UBAP2 | 3203812 | Coding | Cassette exon | rs307682 | 1.30E-07 | isoforms detected |
| USP36 | 3772596 | 3′ UTR | retained intron | unknown | 2.20E-07 | isoforms detected |
Coordinates are given when SNP does not exist in dnSNP. “Unknown” indicates there was no sequence information or very poor quality sequencing results.
P-value of the most significant correlation between an isoform's ratio and the associated SNP genotype.
SNPs affecting splice-sites.
| Gene | SNP ID/new SNP | AS Type | Splice-site sequence | Maximum Entropy Score | Probeset Expression |
| C8orf59 | new SNP | A |
| 8.38 | 138 |
|
| 0.19 | 12 | |||
| DMKN | rs4254439 | C |
| 8.18 | 117 |
|
| 7.75 | 11 | |||
| ERAP2 | rs2248374 | B |
| 9.33 | 69 |
|
| 7.61 | 297 | |||
| MGC16169 | rs12639869 | C |
| 9.79 | 225 |
|
| 5.87 | 26 | |||
| PLD2 | rs3764897 | A |
| 7.10 | 140 |
|
| 2.04 | 43 | |||
| SH3YL1 | rs62114506 | C |
| 11.01 | 118 |
|
| 6.06 | 22 | |||
| TMEM77 | rs3762374 | C |
| 6.59 | 2552 |
|
| −4.72 | 394 | |||
| ZNF419 | rs11672136 | D |
| 8.87 | 56 |
|
| 6.65 | 13 | |||
| PARP2 | rs2297616 | A |
| 9.45 | 1378 |
|
| 6.77 | 86 | |||
| ULK4 | rs1716698 | C |
| 8.76 | 123 |
|
| 3.63 | 13 | |||
| FAM64A | rs7218283 | E |
| 5.25 | 12 |
|
| 1.79 | 73 | |||
| IFI44L | rs1333973 | C |
| 9.79 | 4602 |
|
| 7.81 | 711 | |||
| PPIL2 | rs12484060 | B |
| 5.52 | 154 |
|
| 2.04 | 370 | |||
| OVGP1 | rs1264894 | D |
| 10.03 | 101 |
|
| 9.22 | 28 | |||
| TMEM149 | rs17638853 | A |
| 9.60 | 193 |
|
| 1.42 | 32 | |||
| C14orf129 | rs2053588 | B |
| 9.04 | 9 |
|
| 0.23 | 62 | |||
| CAST | rs7724759 | C |
| 11.11 | 647 |
|
| 7.68 | 117 |
See Figure 2 for a graphical depiction of the two alternative isoform structures and the relative postion of the SNP-affected splice-site.
Upper-case bases represent consensus donor/acceptor site and bold font indicates SNP.
Maximum entropy score as calculated using MaxEntSCan [15].
Averaged PLIER-summarized expression score for each homozygous genotype.
Ancestral genotype, as inferred from the chimpanzee genome.
Cases for which an inverse correlation between splice-site strength and probeset expression is expected based on the two isoform structures and the position of the affected splice-site, as shown in Figure 2.
Figure 2AS type and affected splice-site for SNPs identified in Table 2 and Table 3.
The arrow indicates the splice-site affected by the polymorphism. The genes are read from left to right, as indicated by the intersecting arrow heads. The type of AS event and which splice-site is affected is essential to understanding the relation between the probeset expression change and the theoretical efficiency of splicing. In (A,C,D), the correlation should be positive since the use of the splice-site produces a longer transcript, while in (B,E,F), an inverse relation is expected since the use of the splice-site produces a shorter transcript.
SNPs affecting predicted exonic splicing enhancers (ESEs).
| Gene | SNP ID | AS Type | Allele | Splicing Enhancer Sequence | Splicing Factor | ESE Finder Score | Probeset expression |
| VISA | rs17857295 | C | C |
| SRp40 | 2.80 | 429 |
|
| SRp40 | 2.80 | |||||
|
| SC35 | 4.01 | |||||
|
| SF2/ASF | 2.94 | |||||
| G | None | - | - | 17 | |||
| SGOL1 | rs61729306 | C/D | A |
| SF2/ASF | 3.42 | 95 |
|
| SF2/ASF | 3.10 | |||||
|
| SRp40 | 5.03 | |||||
| T |
| SF2/ASF | 2.67 | 5 | |||
|
| SRp40 | 3.98 | |||||
| WARS | rs941928 | D | C |
| SC35 | 2.34 | 184 |
| G | None | - | - | 12 | |||
| UEVLD | rs56151250 | C | G |
| SC35 | 2.41 | 52 |
|
| SRp55 | 2.97 | |||||
| C | None | - | - | 5 | |||
| AMACR | rs10941112 | C | A |
| SRp40 | 4.94 | 381 |
|
| SF2/ASF | 2.81 | |||||
| G |
| SRp40 | 3.57 | 76 | |||
| DUSP18 | rs5753268 | C | T |
| SC35 | 3.13 | 58 |
|
| SRp40 | 3.26 | |||||
| C |
| SRp55 | 3.89 | 12 | |||
| USMG5 | rs7911488 | D | C |
| SF2/ASF | 2.40 | 1186 |
|
| SRp55 | 3.02 | |||||
| T | None | - | - | 282 | |||
| ESPL1 | rs6580942 | B | A |
| SF2/ASF | 2.19 | 145 |
|
| SF2/ASF | 2.57 | |||||
|
| SC35 | 2.71 | |||||
|
| SRp40 | 2.88 | |||||
| C |
| SF2/ASF | 2.16 | 418 | |||
| MMAB | rs2287180 | C | T |
| SF2/ASF | 2.64 | 217 |
|
| SRp40 | 2.72 | |||||
|
| SRp55 | 3.47 | |||||
| C |
| SF2/ASF | 2.53 | 81 | |||
|
| SRp40 | 3.03 | |||||
| RBCK1 | rs41281892 | C | G |
| SF2/ASF | 4.15 | 95 |
|
| SRp40 | 3.20 | |||||
| A |
| SF2/ASF | 2.37 | 5 | |||
| CCDC41 | New SNP at | C/D | G |
| SC35 | 3.03 | 84 |
| chr12:93353207 |
| SRp35 | 2.82 | ||||
| C |
| SRp55 | 3.63 | 32 | |||
| ZNF83 | rs7248435 | F | C |
| SRp55 | 3.77 | 13 |
| A |
| SRp55 | 3.41 | 56 | |||
| ATP5SL | rs1043413 | D | C |
| SF2/ASF | 4.49 | 379 |
|
| SRp40 | 3.62 | |||||
|
| SC35 | 3.07 | |||||
| G |
| SC35 | 3.29 | 153 | |||
|
| SRp40 | 2.80 | |||||
| RNASEN | rs55656741 | D | C |
| SRp40 | 2.84 | 307 |
| T |
| SC35 | 2.80 | 174 | |||
|
| SRp40 | 3.35 | |||||
| SERGEF | rs211146 | C | A |
| SRp55 | 3.22 | 219 |
| G |
| SC35 | 2.45 | 129 | |||
|
| SRp40 | 2.70 | |||||
|
| SRp55 | 3.83 | |||||
| ACP1 | rs11553746 | D | T | None | - | - | 423 |
| C |
| SRp40 | 4.66 | 269 |
Note: The relative splice-site usage disagrees with expectations for the last 3 cases.
See Figure 2. Cases marked C/D are cases where the SNP is very close to the middle of the exon.
Bold font indicates the SNP position.
Score calculated using ESE Finder 3.0 online tool [16].
Averaged PLIER-summarized expression score for each homozygous genotype.
Ancestral genotype, as inferred from the chimpanzee genome.
SF2/ASF (IgM-BRCA1) [51].
Cases for which an inverse correlation between splicing efficiency and probeset expression is expected based on the two isoform structures and the position of the affected splice-site, as shown in Figure 2.
Figure 3Electrophoresis bands of RT-PCR–amplified minigene mRNA products.
Each column represents a different individual of the specified genotype. Above each gene's respective bands are the abstract depiction of the expected isoform structures associated with each genotype, boxes indicating exons and lines, introns (the relalive lengths are not to scale). For (A–C), the causative SNPs are, respectively, rs2297616, rs2248374 and rs7724759, which all affect the splice-site region, as described in Table 2. In these cases, the band migrations demonstrate that the individual genotypes are tightly linked with a complete change in isoform length. In these cases, only the associated isoform is expected to be present in individuals of a specific homozygous genotype. In (D), the causative SNPs for AMACR and ATP5SL are, respectively, rs10941112 and rs1043413, which are exonic and disrupt ESE sequences, as described in Table 3. While for MMAB, it can be one of 3 consecutive, fully linked SNPs, the prime candidate being rs2287180, which also disrupts ESE sequences (see Table 3). In these last 3 cases, instead of a complete switch in isoform length, we observe a change in the intensities of detectable isoforms which is perfectly associated with the SNP genotype. This makes sense considering the relatively less crucial role of ESEs as compared to the immediate splice-site neighborhood. The isoform structures depicted represent the isoform structure which is favored in each genotype, relative to the other genotype. All of the first columns on the left are 100 bp increment reference ladders.