| Literature DB >> 19154578 |
Sonia H Shah1, Jacqueline A Pallas.
Abstract
BACKGROUND: With the availability of the Affymetrix exon arrays a number of tools have been developed to enable the analysis. These however can be expensive or have several pre-installation requirements. This led us to develop an analysis workflow for analysing differential splicing using freely available software packages that are already being widely used for gene expression analysis. The workflow uses the packages in the standard installation of R and Bioconductor (BiocLite) to identify differential splicing. We use the splice index method with the LIMMA framework. The main drawback with this approach is that it relies on accurate estimates of gene expression from the probe-level data. Methods such as RMA and PLIER may misestimate when a large proportion of exons are spliced. We therefore present the novel concept of a gene correlation coefficient calculated using only the probeset expression pattern within a gene. We show that genes with lower correlation coefficients are likely to be differentially spliced.Entities:
Mesh:
Year: 2009 PMID: 19154578 PMCID: PMC2636774 DOI: 10.1186/1471-2105-10-26
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1False Positives. Examples of probesets falsely identified as differentially spliced. The expression plot shows the mean log 2 intensities of the core probesets in each tissue with standard error bars. The probesets are sorted by genomic location from 5' to 3'. Circled probesets had Benjamini-Hochberg-corrected p-values less than 0.0001. A) and B) False positive due to non-responsive or unexpressed exon in all samples and C) cross-hybridising probeset.
Figure 2Analysis Workflow. A) LIMMA analysis workflow where the probeset and transcript-level summarisation and normalisation is carried out using the Affymetrix Power Tools. All subsequent steps are carried out using R and Bioconductor. B) The effect of filtering low-expressed probesets on the intensity distribution in one of the breast samples. When 25% of the probesets found in the lower quartile in all samples are filtered out, the intensity distribution has a more normal distribution.
Number of significant probesets
| Unfiltered data | Filtered dataset A | Filtered dataset B | |
| Breast-specific | 1725 | 669 | 273 |
| Cerebellum-specific | 12009 | 5890 | 3122 |
| Heart-specific | 2746 | 1492 | 476 |
| Kidney-specific | 3354 | 1382 | 377 |
| Liver-specific | 4956 | 2213 | 636 |
| Muscle-specific | 3153 | 1606 | 583 |
| Pancreas-specific | 4908 | 1984 | 817 |
| Prostate-specific | 2494 | 1292 | 531 |
| Spleen-specific | 1513 | 674 | 332 |
| Testes-specific | 10430 | 2819 | 1476 |
| Thyroid-specific | 897 | 387 | 112 |
| Total no. of significant probesets | 34208 | 14536 | 6895 |
| Total no. of transcript clusters analysed | 17881 | 14180 | 10414 |
| Total no. of transcript clusters with at least one significant probeset | 10122 | 6338 | 3564 |
The table shows the number of probesets with Benjamini-Hochberg-corrected p-values less than 0.0001 in each of the 11 comparisons for the unfiltered and filtered datasets. The table also shows the number of transcript clusters (genes) that were found to contain at least one significant probeset.
Significant Splicing Events
| Affymetrix Probeset ID | Gene and exon | LIMMA Comparison | Rank in analysis of unfiltered dataset (Benjamini-Hochberg-corrected p-value) | Rank with analysis of filtered dataset A (Benjamini-Hochberg-corrected p-value) | Rank with analysis of filtered dataset B (Benjamini-Hochberg-corrected p-value) |
| 3400083 | WNK1 exon 3 | Kidney vs. non-kidney | 53 | 11 | 1 |
| 3400090 | WNK1 exon 4 | Kidney vs. non-kidney | 82 | 21 | 4 |
| 3400056 | WNK1 exon 1 | Kidney vs. non-kidney | 161 | 44 | 9 |
| 3400080 | WNK1 exon 2 | Kidney vs. non-kidney | 507 | 169 | 35 |
| 3427830 | SLC25A3 exon3B | Muscle vs. non-muscle | 25 | 3 | 2 |
| 3427830 | SLC25A3 exon3B | Heart vs. non-heart | 106 | 41 | 8 |
| 3427827 | SLC25A3 exon 3A | Heart vs. non-heart | 236 | 112 | 26 |
| 3427827 | SLC25A3 exon 3A | Muscle vs. non-muscle | 263 | 126 | 35 |
| 3427827 | SLC25A3 exon 3A | Thyroid vs. non-thyroid | 308 | 114 | 22 |
| 3918911 | ITSN exon 40 | Cereb. vs. non-cereb | 480 | 225 | 63 |
| 3918909 | ITSN exon 40 | Cereb. vs. non-cereb | 1822 | 873 | 353 |
| 3918908 | ITSN exon 40 | Cereb. vs. non-cereb | 1884 | 885 | 359 |
| 3918903 | ITSN exon 35 | Cereb. vs. non-cereb | 2304 | 1107 | 466 |
| 2562711 | IMMT exon 6 | Heart vs. non-heart | 37 | 17 | 1 |
| 2319719 | KIF1B exon 20 | Heart vs. non-heart | 129 | 55 | Not present in dataset B |
| 2319721 | KIF1B exon 20 | Heart vs. non-heart | 134 | 61 | Not present in dataset B |
| 2319722 | KIF1B exon 20 | Cereb. vs. non-cereb | 486 | 218 | Not present in dataset B |
| 2319721 | KIF1B exon 20 | Cereb. vs. non-cereb | 1784 | 830 | Not present in dataset B |
Table showing the effect of filtering on the ranking of real splice events. All the probesets in the table had a Benjamini-Hochberg-corrected p-value less than 0.0001 and for each there is literature evidence of the identified splicing events.
Figure 3Tissue-specific . The expression plot shows the mean log 2 intensity signals (with standard error bars) of core probesets targeting SLC25A3 exons in the muscle and non-muscle tissue (top figure) and thyroid compared to non-thyroid tissue (bottom figure). The probesets are plotted from left to right by genomic location (5' to 3'). The horizontal dashed line shows the mean log2 intensity of the negative control probesets. Probesets with intensities below this line are most likely unexpressed. In this case these probesets are targeting either intronic regions or UTRs (coloured in orange). Ensembl transcripts for SLC25A3 are shown below the plot. Probesets with Benjamini-Hochberg-corrected p-values less than 0.0001 are indicated by black arrows. Exon 3A appears to be retained in muscle and thyroid tissues while exon 3B appears to have lower expression in the muscle.
Figure 4Tissue-specific . The expression plot shows the mean log 2 intensity signals (with standard error bars) of core probesets targeting KIF1B exons in the cerebellum and non-cerebellum tissues (top) and cerebellum compared to muscle tissue (bottom). The horizontal dashed line shows the mean intensity of the negative control probesets. Exons with signal below this are likely to be unexpressed. Most exons have higher probeset signals in the cerebellum except for the 5' UTR (coloured in orange) and the terminal exon of the short transcript (marked by vertical dashed lines) which suggests that this exon has much lower expression in cerebellum. All exons common to the long and short transcripts have the same expression levels in muscle and cerebellum. But exons unique to the long transcript have higher expression in the cerebellum tissue while exons unique to the short transcript have higher expression in muscle.
Figure 5Pearson Correlation Coefficients. Transcripts were considered to have cerebellum-specific splicing if they contained at least one probeset with a Benjamini-Hochberg-corrected p-value less than 0.0001 in the cerebellum vs. non-cerebellum SI/LIMMA comparison. Over 80% of transcripts clusters that were considered to have no cerebellum-specific splicing (purple) had correlation coefficients more than 0.9 (more than 60% had coefficients more than 0.95), whereas only 30% of transcript clusters with cerebellum-specific splicing (blue) had correlation coefficients more than 0.9.
Correlation Coefficients
| ITSN | IMMT | KIF1B | SLC25A | WNK1 | |
| breast vs. non-breast | 0.9889 | 0.9734 | 0.9473 | 0.9770 | 0.9709 |
| cereb. vs. non-cereb. | 0.9823 | 0.9661 | 0.9764 | ||
| heart vs. non-heart | 0.9711 | 0.9410 | |||
| kidney vs. non-kidney | 0.9656 | 0.9901 | 0.9215 | 0.9799 | |
| liver vs. non-liver | 0.9766 | 0.9452 | 0.9197 | 0.9875 | 0.9793 |
| muscle vs. non-muscle | 0.9799 | 0.9834 | 0.9771 | ||
| panc. vs. non-panc. | 0.9158 | 0.9772 | 0.9333 | 0.9747 | 0.9427 |
| prost. vs. non-prost | 0.9807 | 0.9940 | 0.9655 | 0.9786 | 0.9431 |
| spleen vs. non-spleen | 0.9852 | 0.9790 | 0.8928 | 0.9739 | 0.9769 |
| testes vs. non-testes | 0.9864 | 0.9960 | 0.9717 | 0.9827 | |
| thyroid vs. non-thyroid | 0.9889 | 0.9734 | 0.9473 | 0.9770 | 0.9709 |
| z-score between highest and lowest coefficients | 9.71 | 4.42 | 6.68 | 2.76 | 5.05 |
Pearson correlation coefficients of the 5 genes (rounded to 4 significant digits) calculated for all eleven tissue comparisons using the unfiltered dataset. Values in bold indicate that the gene had a significant p-value (corrected p-value < 0.0001) in that SI/LIMMA group comparison. The highest and lowest correlation coefficients for each gene were found to be significantly different (all had z-scores greater than 2).