| Literature DB >> 27556805 |
Yongsheng Bai1,2, Jeff Kinne3, Brandon Donham3, Feng Jiang3, Lizhong Ding4, Justin R Hassler5, Randal J Kaufman5.
Abstract
BACKGROUND: Most existing tools for detecting next-generation sequencing-based splicing events focus on generic splicing events. Consequently, special types of non-canonical splicing events of short mRNA regions (IRE1α targeted) have not yet been thoroughly addressed at a genome-wide level using bioinformatics approaches in conjunction with next-generation technologies. During endoplasmic reticulum (ER) stress, the gene encoding the RNase Ire1α is known to splice out a short 26 nt region from the mRNA of the transcription factor Xbp1 non-canonically within the cytosol. This causes an open reading frame-shift that induces expression of many downstream genes in reaction to ER stress as part of the unfolded protein response (UPR). We previously published an algorithm termed "Read-Split-Walk" (RSW) to identify non-canonical splicing regions using RNA-Seq data and applied it to ER stress-induced Ire1α heterozygote and knockout mouse embryonic fibroblast cell lines. In this study, we have developed an improved algorithm "Read-Split-Run" (RSR) for detecting genome-wide Ire1α-targeted genes with non-canonical spliced regions at a faster speed. We applied the RSR algorithm using different combinations of several parameters to the previously RSW tested mouse embryonic fibroblast cells (MEF) and the human Encyclopedia of DNA Elements (ENCODE) RNA-Seq data. We also compared the performance of RSR with two other alternative splicing events identification tools (TopHat (Trapnell et al., Bioinformatics 25:1105-1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012)) utilizing the context of the spliced Xbp1 mRNA as a positive control in the data sets we identified it to be the top cleavage target present in Ire1α (+/-) but absent in Ire1α (-/-) MEF samples and this comparison was also extended to human ENCODE RNA-Seq data.Entities:
Keywords: Alternative splicing; ENCODE; Non-canonical; RNA-Seq; Xbp1
Mesh:
Substances:
Year: 2016 PMID: 27556805 PMCID: PMC5001233 DOI: 10.1186/s12864-016-2896-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1A screenshot for the Read-Split-Run web interface
Comparison of total number of junctions identified by RSR for five cases from Tg and Dtt treated samples
| Parameter | Case 1 | Case 2 | Case 3 | Case 4 | Case 5 | |
|---|---|---|---|---|---|---|
| Variable Name | Minimum split size |
|
|
|
|
|
| Maximum candidate distance |
|
|
|
|
| |
| Read mapping region boundary buffer size |
|
|
|
|
| |
| Minimum candidate distance | 2 | 2 | 2 | 2 | 2 | |
| Minimum number of supporting reads | 2 | 2 | 2 | 2 | 2 | |
| Maximum good alignment allowed per read | 8 | 11 | 11 | 15 | 18 | |
| Tg Het | Total number of junctions identified | 122 | 140 | 143 | 135 | 141 |
| Tg KO | Total number of junctions identified | 153 | 177 | 177 | 170 | 177 |
| Dtt Het | Total number of junctions identified | 6496 | 6614 | 6661 | 6673 | 5683 |
| Dtt KO | Total number of junctions identified (Novel/Known) | 6135 | 6247 | 6285 | 6308 | 5687 |
The bolded numbers show parameters with different test settings
Fig. 2Number of junctions and clock time reported by RSR for the human ENCODE RNA-Seq sample (Separated by chromosomes). Blue bar: junctions; red bar: clock time; purple bar: reads processed (million)
Number of reads for supporting Xbp1 26 nt spliced regions reported by RSR and other tools
| 500 nM Thapsigargin (Tg) | 1 mM Dithiothreitol (Dtt) | |||
|---|---|---|---|---|
| Software | Het ( | KO ( | Het ( | KO ( |
| Read-Split-Run (RSR) | 21 | 0 | 173 | 0 |
| TopHat | 23 | 86 | 59 | 289 |
| BWA | 0 | 0 | 67 | 0 |
| Bowtie2 | 0 | 0 | 171 | 0 |
| STAR | 0 | 0 | 0 | 0 |
| Alt Event Finder | 0 | 0 | 0 | 0 |
Fig. 3Comparison results in identifying number of spliced regions between RSR, Alt Event Finder, TopHat tools. Blue bar: TopHat; red bar: RSR; green bar: AltEventFinder
Fig. 4Linear regression equations for mouse MEF Tg and Dtt experiments
The overlapping spliced junctions identified by TopHat and our RSR
| mouse-Het | mouse-KO | mouse-Het | mouse-KO | human | |
|---|---|---|---|---|---|
| Software | 500 nM Thapsigargin (Tg) | 1 mM Dithiothreitol (Dtt) | ENCODE | ||
| TopHat | 956 | 923 | 8897 | 7847 | 237,155 |
| RSR | 144 | 183 | 6727 | 6343 | 32,597 |
| Common | 38 | 41 | 2398 | 2128 | 314 |
| Common/TopHat | 3.97 % | 4.44 % | 26.95 % | 27.12 % | 0.13 % |
| Common/RSR | 26.39 % | 22.40 % | 35.65 % | 33.55 % | 0.96 % |
Fig. 5Pseudocode for Read-Split-Run algorithm. The junctions output by step 6 of the algorithm can optionally be restricted to those which are supported by some minimum number of sequences
The running parameter values employed for both Tg and Dtt samples
| Sample | Minimum split size (MS) | Maximum candidate distance (MD) | Read mapping region boundary buffer size (BB) |
|---|---|---|---|
| Tg | 8, 11, 12, 16 | 10000, 20000, 30000, 40000, 50000 | 1, 3, 5, 7, 9 |
| DTT | 11, 16, 20, 24 | 10000, 20000, 40000, 50000 | 3, 5, 7, 9 |
Fig. 6A modified General Liner Model for the Read-Split-Run algorithm