MOTIVATION: Recent studies reveal an important role of non-coding circular RNA (circRNA) in the control of cellular processes. Because of differences in the organization of plant and mammal genomes, the sensitivity and accuracy of circRNA prediction programs using algorithms developed for animals and humans perform poorly for plants. RESULTS: A circRNA prediction software for plants (termed PcircRNA_finder) was developed that is more sensitive in detecting circRNAs than other frequently used programs (such as find_circ and CIRCexplorer), Based on analysis of simulated and real rRNA-/RNAase R RNA-Seq data from Arabidopsis thaliana and rice PcircRNA_finder provides a more comprehensive sensitive, precise prediction method for plants circRNAs. AVAILABILITY AND IMPLEMENTATION: http://ibi.zju.edu.cn/bioinplant/tools/manual.htm CONTACT: fanlj@zju.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.
MOTIVATION: Recent studies reveal an important role of non-coding circular RNA (circRNA) in the control of cellular processes. Because of differences in the organization of plant and mammal genomes, the sensitivity and accuracy of circRNA prediction programs using algorithms developed for animals and humans perform poorly for plants. RESULTS: A circRNA prediction software for plants (termed PcircRNA_finder) was developed that is more sensitive in detecting circRNAs than other frequently used programs (such as find_circ and CIRCexplorer), Based on analysis of simulated and real rRNA-/RNAase R RNA-Seq data from Arabidopsis thaliana and rice PcircRNA_finder provides a more comprehensive sensitive, precise prediction method for plants circRNAs. AVAILABILITY AND IMPLEMENTATION: http://ibi.zju.edu.cn/bioinplant/tools/manual.htm CONTACT: fanlj@zju.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.
Non-coding circular RNA (circRNA) is a covalently continuous closed loop that usually originates from exonic regions (named exonic circRNA), but can also arise from intronic and intergenic regions. CircRNAs can function as a miRNA sponge (Hansen ; Memczak ) and have the potential to enhance transcription of their host genes (Li ; Zhang et al., 2013). The emergence of rRNA-depleted high-throughput RNA-Seq technology provides a revolutionary approach for the systematic discovery of circRNAs in various species, including human, mouse, Arabidopsis and rice (Lu ; Ye ).A robust method for circRNA identification is an important tool for investigating the role of these molecules. The available circRNA prediction methods (e.g. find_circ and CIRCexplorer) were primarily developed for use with human or animal datasets (Memczak et al., 2013; Pan and Xiong, 2015; Salzman ; Szabo ; Zhang ). There are large differences between mammal and plant genomes and therefore the prediction accuracy and sensitivity of detecting circRNAs in plants using the currently available methods are relatively low (Ye ). In this study, we developed a software (termed PcircRNA_finder) that shows a more comprehensive ability and greater sensitivity and precision in predicting circRNAs in plants.
2 Materials and Methods
PcircRNA_finder is mainly designed for exonic circRNA prediction and consists of three modules as shown in Figure 1. These modules are: (i) Catcher, which is used to collect all backsplice sites by chiastic clipping mapping of PE reads based on available main fusion detection methods, including Tophat-Fusion (Kim and Salzberg, 2011), STAR-Fusion (Dobin, et al., 2013), find_circ (Memczak ), Mapsplice (Wang ) and segemehl (Hoffmann ). Among these candidate backsplice sites, false positive sites will be filtered out in the Filter module. The increased read mapping accuracy in our program excludes some false predictions dues to the high copy number of genes in plants (Supplementary Data). (ii) Annotator, that can be used to annotate the candidate exonic backsplice sites based on available gene annotation. Recent studies have demonstrated that circRNA's backsplicing site is flexible and alternative splicing of circRNAs is prevalent (Starke ; Szabo ). Much of the alternative splicing of circRNAs occurred near by canonical splicing sites (Szabo; Starke ) and therefore, 5-bp flanking the two canonical backsplice sites (acceptor and donor) were allowed for our candidate backsplice sites and (iii) Filter, which is a quality control module for the above candidate circRNAs. It creates a pseudoRef file with the flanking sequences of chiastic backsplice sites and then maps raw reads to it and confirms the backsplice sites. It also requires that the candidate circRNAs contain at least one of two kinds of splicing signals, either a U2 based spliceosome (usually with a consensus sequence of GT-AG and GC-AG) and a U12-based minor spliceosome (usually with a consensus sequence of AT-AC) (Reddy ; Staiger and Brown, 2013).
Fig. 1.
The flowchart of PcircRNA_finder for circRNA prediction. It consists of three modules (stages)
The flowchart of PcircRNA_finder for circRNA prediction. It consists of three modules (stages)
3 Benchmark
To test the performance of PcircRNA_finder, we first compare it with two popular circRNA finding algorithms (find_circ and CIRCexplorer) using a simulation dataset for the analysis. Simulated RNA-Seq data (paired end reads, 100 ;bp and 6000 backsplicing reads for each sample) were generated by randomly choosing 200 chiastic transcripts based upon the Arabidopsis thaliana and rice genome annotations, respectively (Supplementary Data). The sensitivity, precision and sensitivity ;+ ;precision (a comprehensive value) (Chuang ) was used to evaluate the performance of the three methods. The results indicate that PcircRNA_finder has a higher sensitivity (74–88%) than either find_circ or CIRCexplorer (each about 20%) and better precision (63–67%) compared to find_circ and CIRCexplorer, (72 and 100%, respectively) in the two test genomes (Supplementary Data). Finally, PcircRNA_finder obtained a significantly higher comprehensive value in the two test plant species (68–76%), compared to the other two methods (each ;<35%).Transcriptomic data were generated from three RNA-Seq libraries (‘RNAase R’, ‘rRNA-’ and ‘polyA’) of rice seedlings (Supplementary Data). ‘RNAase R’ refers to linear mRNAs isolated from the rice seedlings that were degraded by RNAase R treatment (Circle-Seq, Jeck and Sharpless, 2014). CircRNAs in the various samples were predicted using all three circRNA prediction methods. Using PcircRNA_finder, we found 1,113 circRNAs in the RNAase R sample compared to 915 and 933 predicted by find_circ and CIRCexplorer, respectively. Of the circRNAs detected by PcircRNA_finder, 567 were not found using the other prediction programs. We define high-confidence circRNAs as those predicted circRNAs found in common between the ‘RNAase R’ and ‘rRNA-’ libraries, but not present in the ‘polyA’ library. Based on this definition, PcircRNA_finder predicted more high-confidence circRNAs from the rice RNA-Seq data sample (117) than either of the other two methods (104 and 74) (Supplementary Data).
Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937