| Literature DB >> 28984182 |
Yongsheng Bai1,2, Jeff Kinne3, Lizhong Ding4, Ethan C Rath4, Aaron Cox3, Siva Dharman Naidu3.
Abstract
BACKGROUND: It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The "Read-Split-Walk" (RSW) and "Read-Split-Run" (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, "Read-Split-Fly" (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis.Entities:
Keywords: Alternative splicing; ENCODE; Non-canonical; RNA-Seq; Read-Split-Fly
Mesh:
Substances:
Year: 2017 PMID: 28984182 PMCID: PMC5629565 DOI: 10.1186/s12859-017-1801-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Specificity measured for all detected junctions (red) and for just novel junctions (blue) across 70 ENCODE samples
Fig. 2Measure of sensitivity of RSF across 70 samples. (a) Sensitivity measured for all known junction across 70 different samples separated by RPKM of supporting reads. All detected and possible junctions (blue), Bin 1 (red) RPKM <5, Bin 2 (green) RPKM 5–10, Bin 3 (purple) RPKM 10–50, Bin 4 (light blue) RPKM 50–100, and Bin 5 (orange) RPKM >100. (b) Sensitivity for the genes detected by RSF with a single isoform, bins same as above. (c) Total sensitivity of all genes across all samples in each bin (explained above) for all genes detected (blue) and only single isoform genes detected (red)
Fig. 3Comparison of average memory usage (a) and average running time (b) between RSR (blue) and RSF (red) for 66 ENCODE samples. Error bars show a single standard deviation
Comparison of junctions detected by RSR and RSF
| RSR | RSF | |
|---|---|---|
| Entries | 146,769 | 155,021 |
| Unique Splices | 15,762 | 24,014 |
| Splice junctions in common | 131,007 | |
Number of unique splice junctions with at least 50% greater frequency in Cancer samples than Normal samples, and vice versa
| Total unique junctions (Cancer) | 297 |
|---|---|
| Adenocarcinoma | 297 |
| Neuroblastoma | 1 |
| Cervical Cancer | 2 |
| Breast Cancer | 297 |
| Leukemia | 294 |
| Total unique junctions (Normal) | 611 |
Number of junctions reported by RSF for various splicing categories of U12 and U2-type
| U12 | U2 | Grand Total | |||||
|---|---|---|---|---|---|---|---|
| Query | Known | Novel | U12 | Known | Novel | U2 | |
| 3p_Full | 6583 | 742 | 7325 | 823 | 155 | 978 | 8303 |
| 5p_Full | 204 | 272 | 476 | 14 | 47 | 61 | 537 |
| branch | 2357 | 386 | 2743 | 0 | 0 | 0 | 2743 |
| Grand Total | 9144 | 1400 | 10,544 | 837 | 202 | 1039 | 11,583 |
Fig. 4The miRBase miRNA hits for 70 ENCODE samples (Only miRNAs that have > = 30 hits are labeled)
Top hits for disease associated miRNAs
| miRNA name | Hits | Associated diseases |
|---|---|---|
|
| 64 | Disease Progression Lymphoma, Large B-Cell, Diffuse 0Melanoma Neoplasm Metastasis Neoplasms Skin Neoplasms Uterine Cervical Neoplasms |
|
| 46 | Acute Disease Carcinoma, Hepatocellular Cell Transformation, Neoplastic Chromosome Deletion Colorectal Neoplasms Cri-du-Chat Syndrome Disease Progression Glioblastoma Hematologic Neoplasms Liver Neoplasms Lymphoma, Large B-Cell, Diffuse Melanoma Microsatellite Instability Multiple Sclerosis Neoplasm Metastasis Neoplasms Neoplasms, Glandular and Epithelial Ovarian Neoplasms Prostatic Neoplasms Pulmonary Embolism Skin Neoplasms |
|
| 46 | Acute Disease Carcinoma, Hepatocellular Cell Transformation, Neoplastic Chromosome Deletion Colorectal Neoplasms Cri-du-Chat Syndrome Disease Progression Glioblastoma Hematologic Neoplasms Liver Neoplasms Lymphoma, Large B-Cell, Diffuse Melanoma Microsatellite Instability Multiple Sclerosis Neoplasm Metastasis Neoplasms Neoplasms, Glandular and Epithelial Ovarian Neoplasms Prostatic Neoplasms Pulmonary Embolism Skin Neoplasms |
|
| 31 | Not Available |
|
| 30 | Disease Progression Lymphoma, Large B-Cell, Diffuse Melanoma Neoplasm Metastasis Neoplasms Skin Neoplasms Uterine Cervical Neoplasms |
|
| 30 | Not Available |
|
| 30 | Not Available |
Fig. 5A display of the number of common junctions for cancer (red diamond) and normal samples (green squares) that are present in a given percentage of samples for each class (cancer and normal). Number of samples in each category is normalized by dividing the number of samples by the total amount of samples in that class (Green: Normal; Red: Tumor)
The categorized U12-type and U2-type introns as queries and their stretch of the sequence included
| Categorized names | Stretch of sequence included |
|---|---|
| u12db_3pFull_u12 | 40 bp of the 3′ acceptor site in the intron and 6 bp of the beginning of the right adjacent exon |
| u12db_3pFull_u2 | 40 bp of the 3′ acceptor site in the intron and 6 bp of the beginning of the right adjacent exon |
| u12db_5pFull_u12 | 15 bp of the 5′ donor site in the intron and 10 bp of the end of the left adjacent exon |
| u12db_5pFull_u2 | 15 bp of the 5′ donor site in the intron and 10 bp of the end of the left adjacent exon |
| u12db_branch_u12 | from 10 bp to the left of the branch site to the 3′ donor site of the intron |
Fig. 6U12/U2 5’splice category. The logos of the figure are adapted from Padgett [61]