| Literature DB >> 31462970 |
Jingjing Zhao1,2, Qin Li1,2, Yuchen Li1,2, Xianghuo He1,2, Qiupeng Zheng1,2, Shenglin Huang1,2.
Abstract
RNA splicing may generate different kinds of splice junctions, such as linear, back-splice and fusion junctions. Only a limited number of programs are available for detection and quantification of splice junctions. Here, we present Assembling Splice Junctions Analysis (ASJA), a software package that identifies and characterizes all splice junctions from high-throughput RNA sequencing (RNA-seq) data. ASJA processes assembled transcripts and chimeric alignments from the STAR aligner and StringTie assembler. ASJA provides the unique position and normalized expression level of each junction. Annotations and integrative analysis of the junctions enable additional filtering. It is also appropriate for the identification of novel junctions. ASJA is available at https://github.com/HuangLab-Fudan/ASJA.Entities:
Keywords: Circular RNA; RNA splicing; RNA-seq; Splice junctions
Year: 2019 PMID: 31462970 PMCID: PMC6709372 DOI: 10.1016/j.csbj.2019.08.001
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1A schematic overview of the ASJA workflow. The ASJA architecture consists of three layers (from top to bottom) including chimeric alignment by STAR, junction identification by different model based on characteristics of the three types of junctions, and finally integration of splicing junctions utilization ratio and gene status.
Fig. 2Performance of ASJA on validation dataset. (A) Venn diagram shows the number of circRNAs predicted by three circRNA prediction tools, using 12 normal tissues. (B) Overlap of prediction results between two samples (RNaseR+, ribominus RNA treated with RNase R; RNaseR-, ribominus RNA) for three tools. (C) Times consumed (Minutes) by the softwares to analyse each run of validation samples.
The performance of different fusion junction detectors using validated samples.
| Samples | Statistics | ASJA | MapSplice2 | deFuse |
|---|---|---|---|---|
| CGGA_661(1)* | Total | 3 | 24 | 35 |
| TP | 1 | 1 | 1 | |
| Recall | 100% | 100% | 100% | |
| Precision | 33.30% | 4.16% | 2.85% | |
| CGGA_374(3) | Total | 24 | 38 | 73 |
| TP | 2 | 2 | 3 | |
| Recall | 66.70% | 66.70% | 100% | |
| Precision | 8.30% | 5.20% | 4.10% | |
| CGGA_1329(5) | Total | 48 | 71 | 153 |
| TP | 4 | 5 | 5 | |
| Recall | 80% | 100% | 100% | |
| Precision | 8.33% | 7.04% | 3.26% |
Note: * The value in parentheses is the number of validated fusions. n.
The number of junctions in each sample.
| Sample | Linear junction | Back-splice junction | Fusion junction |
|---|---|---|---|
| Brain01 | 187,293 | 14,055 | 0 |
| Brain02 | 165,503 | 9542 | 0 |
| Colon01 | 164,370 | 5365 | 0 |
| Colon02 | 167,407 | 5736 | 1 |
| Stomach01 | 171,263 | 4226 | 1 |
| Stomach02 | 151,269 | 2733 | 1 |
| Liver01 | 152,997 | 3920 | 0 |
| Liver02 | 139,087 | 2894 | 1 |
| Heart01 | 162,130 | 8084 | 0 |
| Heart02 | 151,089 | 6351 | 0 |
| Lung01 | 173,994 | 6304 | 2 |
| Lung02 | 172,911 | 6169 | 1 |
| BLCA_N | 169,543 | 3465 | 0 |
| BLCA_T | 166,492 | 5936 | 2 |
| BRCA_N | 151,849 | 9875 | 0 |
| BRCA_T | 173,950 | 10,887 | 13 |
| CRC_N | 172,163 | 5600 | 3 |
| CRC_T | 172,762 | 4696 | 0 |
| GC_N | 146,109 | 3469 | 2 |
| GC_T | 155,763 | 2278 | 0 |
| HCC_N | 163,502 | 5338 | 0 |
| HCC_T | 171,358 | 5513 | 4 |
| KCA_N | 167,979 | 6101 | 1 |
| KCA_T | 163,515 | 7895 | 1 |
| PRAD_N | 169,059 | 6070 | 1 |
| PRAD_T | 161,242 | 3955 | 1 |
| Sum of unique junctions | 322,675 | 81,484 | 33 |
Note: BLCA: bladder urothelial carcinoma, BRCA:breast cancer, CRC: colorectal cancer, HCC: hepatocellular carcinoma, GC: gastric cancer, KCA: kidney clear cell carcinoma, PRAD: prostate adenocarcinoma.
Fig. 3The characteristics of ASJA identified linear junctions in human cells. (A) The pie chart shows the number of raw (grey) and high confidence (red) linear junctions. (B) The distribution of the linear junctions to annotated known genes. The unannotated junctions are shown as novel junctions. (C) The pie chart shows the proportion of gene isoforms and intergenic junctions in novel junctions. (D) Bar chart shows the number of novel junctions in different normal tissues. Gene isoforms are shown in red. Intergenic genes are shown in blue. (E) Clustering analysis of all differentially expressed junctions between cancer tissues and NCTs. The heatmap is based on expression values with log2(fold-change) > 1 and p < .01 (Wilcoxon test).
Fig. 4The characteristics of ASJA identified back-splice junctions (circRNA) by in human cells. (A) The pie chart shows the number of annotated and unannotated circRNAs. (B) Bar plots showing the number of genes in different number of back splicing events. (C) Genomic origin of circRNAs. The pie chart shows the genomic distribution of all predicted/annotated circRNAs. (D) Multidimensional scaling screen for the identification highly abundant circRNAs. Red dots and grey dots represent highly abundant and low-abundance circRNAs, respectively.