| Literature DB >> 26657634 |
Thomas B Hansen1, Morten T Venø2, Christian K Damgaard2, Jørgen Kjems2.
Abstract
CircRNAs are novel members of the non-coding RNA family. For several decades circRNAs have been known to exist, however only recently the widespread abundance has become appreciated. Annotation of circRNAs depends on sequencing reads spanning the backsplice junction and therefore map as non-linear reads in the genome. Several pipelines have been developed to specifically identify these non-linear reads and consequently predict the landscape of circRNAs based on deep sequencing datasets. Here, we use common RNAseq datasets to scrutinize and compare the output from five different algorithms; circRNA_finder, find_circ, CIRCexplorer, CIRI, and MapSplice and evaluate the levels of bona fide and false positive circRNAs based on RNase R resistance. By this approach, we observe surprisingly dramatic differences between the algorithms specifically regarding the highly expressed circRNAs and the circRNAs derived from proximal splice sites. Collectively, this study emphasizes that circRNA annotation should be handled with care and that several algorithms should ideally be combined to achieve reliable predictions.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26657634 PMCID: PMC4824091 DOI: 10.1093/nar/gkv1458
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overview of algorithms
| Tool | Version | Language | Mapper | #circRNAs / # | Pros | Cons | |
|---|---|---|---|---|---|---|---|
| circRNA_finder | N/A | Perl | STAR | Yes | 1532/926 (60.4%) | Fast | Low sensitivity and low accuracy |
| CIRCexplorer | 1.0.6 | Python | Bowtie1 and 2 | No | 2638/1845 (69.9%) | High accuracy and good sensitivity | Slow, gene annotation requirement |
| CIRI | 1.2 | Perl | Bwa | Yes | 4067/2279 (56.0%) | High sensitivity | Slow, high RAM requirements and low accuracy |
| find_circ | N/A | Python | Bowtie2 | Yes | 2336/1388 (59.4%) | Fast and low RAM requirements | Low accuracy |
| Mapsplice | 2.1.8 | Python | Bowtie1 | No | 2376/1738 (73.1%) | High accuracy and good sensitivity | Slow, gene annotation requirement |
Figure 1.Prediction of circRNAs by five different prediction algorithms. (A) Venn diagram depicting the overlap between the five different circRNA prediction algorithms. (B and C) Stacked barplot of RNase R resistance of the all predicted circRNAs (B) or exotic circRNA (C, only found by one algorithm) divided into RNAse R resistant (green), Unaffected (grey) and RNAse R sensitive (red), as denoted. Percentage reflects the fraction of RNAse R sensitive circRNAs. (D) Stacked barplot of circRNA annotation divided into exonic (green), unannotated (grey), or lariat (red). (E and F) Ranked plot of the top 100 expressed circRNAs (E) or top 100 exotic circRNAs (F) predicted by each algorithm color-coded as in B. Percentage reflects the fraction of RNase R sensitive circRNAs (false positives) within the plotted top 100.
Figure 2.Sensitivity and splice site distance. (A and B) Cumulative plot of readcount (A) and barplot showing mean number of reads (B) for the 854 circRNA species predicted by all five algorithms. (C and D) For each algorithm, the duration in minutes (C) or the max RAM usage in gigabytes (GB) (D) predicting circRNAs in datasets as denoted. Numbers reflect average duration or average RAM usage. (E) Cumulative plot of splice site distances for the circRNAs predicted by each algorithm. (F) As in E but with delimited X-axis scale. (G) Barplot as in Figure 1B of circRNAs with splice sites below 500 bp apart. (H and I) Ranked distance plot of all circRNAs predicted (H) and exotic circRNAs only (I) colorcoded as denoted.
Figure 3.Combining prediction algorithms. (A) Stacked barplot of circRNA candidates common for paired prediction using algorithms as denoted. Color coded as in Figure 1B. ‘All combined’ denotes circRNA species identified by all five algorithms. (B and C) Ranked expression plot of top 100 circRNA species identified by all algorithm pairs (B) or by all five algorithms combined (C) as in figure 1E.