Literature DB >> 35386433

A Brief Review of circRNA Biogenesis, Detection, and Function.

Ying Liang¹, Niannian Liu¹, Le Yang¹, Jianjun Tang¹, Yinglong Wang¹, Meng Mei².

Abstract

Circular RNA (circRNA) is a non-coding molecule produced through alternative splicing of one or more exons of a gene in the presence of an RNA-induced silencing complex (RISC). Its formation depends on complementary intron sequences on both sides of the circularized sequence. CircRNA functions as a sponge for miRNA, playing the role of the transcriptional regulator or potential biomarker. It has an impact on fetal growth and on synaptic facilitation in the brain. In this review, we illustrate biogenesis mechanisms, characteristics, and functions of cirRNAs. We also summarize methods using sequence feature and RNA next-generation sequencing data for circRNA prediction. Finally, we discuss the state of the research on circRNA in diseases, which will bring new contributions to future disease treatments.

Entities: Chemical

Keywords: CircRNA; alternative splicing; biomarker; complementary intron; disease; miRNA sponge

Year: 2021 PMID： 35386433 PMCID： PMC8905635 DOI： 10.2174/1389202922666210331130722

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.689

INTRODUCTION

Circular RNA (circRNA) is a non-coding RNA (ncRNA) molecule widely found in eukaryotes. CircRNA has no 5' cap and 3' end poly (A) tail and forms a ring structure by a covalent bond. CircRNA expression is stable because it is not easily degraded by RNA exonucleases [1]. CircRNA has a wide range of origins and tissue specificities and plays a variety of roles in the growth and development of organisms. CircRNA molecules are rich in microRNA (miRNA) binding sites and thus act as miRNA sponges in cells. Such binding hinders the inhibitory effects of miRNAs on their target genes and increases the expression level of these genes [2]. CircRNA was first discovered while studying the potato spindle tuber disease (PSTD) that is mediated by viruses with no protein coats, made of a single-stranded genome that forms a closed ring RNA molecule [3]. Later, using electron microscopy, researchers observed the existence of circRNA in the cytoplasm of eukaryotic cells, during the gene transcription of SRY (Sry circRNA is another highly expressed circRNA found in mouse testis) [4-6]. These findings did not generate much attention at first until the emergence of the next-generation sequencing technology in 2012. Salzman et al. discovered that non-linear circular transcripts formed by exon rearrangements existed in normal human primary blood cells, leukemia cells, and HeLa cell lines [7]. Since the first report on circRNA, a large number of circRNA molecules have been identified. Although the presence of circRNAs in human cells has been discovered for more than two decades, it is only recently that their significance in human cells has been appreciated [7-9]. Studies have shown that the expression and role of circRNA are related to the occurrence and development of various human diseases (cardiovascular diseases, diabetes mellitus, cancer), and also related to plant diseases, such as the potato bacterial soft rot. CircRNAs are also related to biological tissue development and cell aging. Expression of specific circRNAs in different biological samples makes them ideal biomarkers for disease diagnosis and to assess tissue development. The elucidation of their mechanisms of action driving diseases could also designate circRNAs as therapeutic targets [10].

BIOGENESIS, CHARACTERISTICS, AND FUNCTIONS OF CircRNA

Biogenesis

According to gene structure annotation information, there are five main origins of circRNAs. These origins are: 1) exon-only circular RNA: Exon circRNA [7]; 2) intron only source: Circular intronic RNA [11]; 3) back-splicing of upstream exons and intron retention: Exon-intron circular RNA [12]; 4) circular RNA from Fusion Gene: f-circRNA [13, 14]; and 5) read-through circular RNA formed by polymerase II: rt-circRNA [15]. Although the origins of circRNAs are diverse, circRNAs are mainly produced by head-to- tail back splicing. The formation of circular RNA is regulated by cis-acting elements and trans-acting elements, through mechanisms that are not completely understood. There are currently three known mechanisms driving the formation of circular RNA (Fig. ). In flanking intron reverse splicing pair drive (Fig. ), the intron sequences on both sides of the exon are complementary paired so that the 5′ splice donor site of the mRNA precursor directly joins with the 3′ splice acceptor site to form a circular RNA. Intron pair drive is also called the direct back-splicing model [16]. Studies have found that introns of less than 100 nt with reverse complementary repeat sequences can promote exon circularization. Reverse complementary matches (RCMs), intronic complementary sequences (ICSs), and Alu elements are complementary pairing sequences that are abundant on the flanking introns of circularizable exons. They effectively promote the pairing of flanking introns and circularization [17, 18]. In RNA binding protein drive (Fig. ), RNA-binding proteins promote the formation of tissue-specific circular RNA. RNA binding proteins participate in the formation of circular RNA by binding specific motifs in flanking intron sequences [19-21]. For example, in Drosophila melanogaster and human, the second exon of the MBL/MBNL1 gene can form circular RNA that depends on a specific binding motif on flanking introns in this gene [19]. Heterogeneous nuclear rib nucleoprotein L (HNRNPL) promotes the formation of circular RNA by binding these flanking sequences [22]. In lasso driven cyclization (Fig. ), when the mRNA precursor undergoes GU/AG splicing, exon skipping produces an intron-exon-containing lasso intermediate, which then undergoes back-splicing to form a circular RNA. Current studies in Arabidopsis thaliana, Solanum lycopersicum, and Zea mays have shown that circular RNAs driven by lasso are widespread in plants [16, 23, 24].

Characteristics

CircRNAs have unique characteristics that distinguish them from linear non-coding RNAs. They are widely expressed in human cells, and their expression sometimes exceeds 10 times the level of their linear isomers [7, 8]. They are more stable in the human body than linear RNAs because they are not easily degraded by RNase R [9]. CircRNAs have a tiny response element that can interact with miRNAs to regulate the expression of target genes. Most circRNAs are formed by exons, while a few are from introns only. Most of them are non-coding, highly conserved, and located in the cytoplasm; only a few are located in the nucleus [2]. Most circRNAs can play a regulatory role in transcription and post transcription, only a few of them play a role in transcription [11].

Functions

CircRNAs have multiple binding sites for miRNA, which results in their sponge-like function for miRNA [ 25 ]. In cancers, accumulating evidences show that circRNAs, such as circRNA-Ctfrc, for example, act as sponges for miRNA and participate in transcriptional regulation of target genes [26]. Researchers have found such miRNA sponge function in Rice, Wheat, Tomato, Tangerine, and Arabidopsis. Some circRNAs also have one or more binding sites for RNA binding proteins [27]. CircRNAs can regulate the alternative splicing of host genes. In Arabidopsis thaliana, circRNA can strongly bind to the DNA locus of the host gene to form an R-loop structure of RNA: DNA heterozygote. The R-loop structure can inhibit the transcription of this region. A cross exon alternative splicing event occurs, which promotes the production of an alternative splicing transcript variant (SEP3.3), affecting the flowering phenotype [28]. In addition, the data from bamboo showed that the frequency of alternative splicing events in genes producing circular RNA was significantly higher than that of randomly selected genes, indicating that circular RNA could regulate the selective splicing process of linear genes [29]. CircRNAs can regulate the expression of host genes. Early studies have shown that EIciRNA can interact with U1 snRNA to form the complex EIciRNA-U1 snRNP. The complex interacts with the PolIItranscription complex to promote the expression of host genes [12]. In plants’ research, Lu et al. also found that circRNAs and their linear isomers can inhibit the post-transcriptional expression of host genes by constructing a genetic transformation system of rice overexpression [30]. CircRNAs have a potential translational function. There are a large number of consistent m6A motifs on circRNAs in human cells. With the participation of various proteins, one m6A site can activate the translational function of circRNA [31]. In Drosophila melanogaster, a particular class of circRNA was found to use the promoter of host genes to combine with ribosomes and promote translation [32]. Recently, it has been found that a high concentration of RNA cyclized in vitro can induce the expression of proteins in mice, which indicates that circRNA can be used as an effective protein expression tool [33]. CircRNAs can induce the formation of pseudogenes. Pseudogenes are derived from linear mRNAs by reverse transcription and integrated into the genome. Linear mRNAs can form pseudogenes that have the same sequences as the exons. Stable circRNAs can also be reverse transcribed to produce pseudogenes that have reversed sequences then integrated into the genome [34].

DETECTION OF CircRNAs

Methods Based on RNA Sequencing Data

Since the first report of 80 circRNAs found by Salzman using RNA-Seq sequencing data in 2012, the next-generation sequencing technology has been continuously improved, and multiple ring RNA tools have been generated [7]. RNA next-generation sequencing technology not only generates high throughput and accurate information, but can also identify new circRNAs. It is a cost-effective method and its rapid development in bioinformatics, increased the number of circRNAs that have been found in mammals, humans, and plants [35, 36]. CircRNA is different from linear RNA, which has a closed-loop structure that is more stable than linear RNA, because in contrast to linear RNA, circRNA is not easily degraded by RNase R. Current studies have found that circRNA molecules are rich in miRNA binding sites, which can quench miRNAs and decrease their inhibitory effects on target genes. Detection and sequencing of circRNAs in organisms can be improved by enrichment of low abundance circRNA. This can be achieved by removing rRNAs and by digesting linear RNAs. After construction of the sequence database of circRNAs, analyses of distribution, annotation, expression, conservation, and interaction with miRNA can be carried out to further explore the functions of these circRNAs. At present, there are relatively few studies and comprehensive annotations on circRNAs. Because the traditional detection technologies could only detect the known circRNAs, information on large number of specifically expressed circRNAs are missing. Next-generation sequencing data is used to identify circRNA, using software that can not only identify known circRNAs but can also predict new circRNAs (Table ). The emergence of RNA next-generation sequencing data makes it possible to sequence hundreds of millions of short fragments, of which single base pair analysis provides a method for accurate and effective identification of circRNAs. Various software packages can be used to detect circRNAs from RNA-Seq data. KNIFE, NCL Scan and PTES Finder rely on providing gene annotation information to construct circRNA sequences for circRNA detection [39, 44, 45]. Find-circ and UROBORUS can be used in combination to map unmapped reads to the genome, extract the first and last 20 bp anchors from unmapped reads, and then obtain reverse splicing events from the location information of these anchors [9, 41]]. NCL Scan and PTES Finder use the estimated segmented anchor location information to create the putative circRNA sequence [44, 45]. CircRNA-finder, CIRC explorer, DCC, Map splice, and Segemehl detect and analyze alternative splicing events by splicing alignment algorithm [38, 40, 42, 43, 46]. CircRNA-finder needs to pair the end sequencing data and relies on the RNA-Seq splicing alignment software STAR [49]. PcircRNA-Finder is the software specially designed for the identification of plant circRNA, which accurately distinguishes and predicts the downstream donors and upstream receptors of potential circRNA in plants with high sensitivity [50]. Although several software and algorithms are developed for circRNA detection using RNA next-generation sequencing data, there is no uniform measurement standard and there is still a large space for improvement of this software.

Machine Learning Methods Based on Sequence Feature

The existing circRNA recognition tools that rely on RNA next-generation sequencing data usually have shortcomings of lack of accuracy, low repeatability between different methods, and high false positive or false negative rate [51]. In order to solve this problem, a growing number of researchers use machine learning methods to establish models to distinguish circRNA according to the inherent characteristics of sequences without relying on sequencing data (Table ). In recent years, the combination of sequence features and machine learning has been successfully applied to the prediction of gene regulatory sites and splicing sites [52]. The contextual regression model is trained to predict the formation of circRNA from a random genomic locus on the human genome, with potential biogenesis factors of circRNA as the features of the training data. The features are inputted into a neural network that generates a contextual weight for each feature that represents the importance of the features. Then, the features are weighted by the corresponding weights to makes an easier separation of samples by summing the weighted features to get the prediction [53]. Liu et al. uncovers a potential new link between circRNA biogenesis and flanking CpG islands, which suggests a potential correlation between DNA methylation and circRNA biogenesis. The HELM method is to learn a sparse encoder in an unsupervised way and transform the original input into a higher-level representation. The random perturbation of the output matrix is the input for supervised feature classification. Only one hidden layer can be used for supervised feature classification. The goal is to minimize the training error and output weight norm [54]. Chaabane et al. propose a new method circDeep, to fuses an RCM descriptor, ACNN-BLSTM sequence descriptor and a conservation descriptor into high-level abstraction descriptors, where the shared representations across different modalities are integrated. The next steps are to add feature fusion learning to heterogeneous descriptors, train a single DNN, and finally predict circRNA [55]. PredcircRNA focuses on distinguishing circRNA from other lncRNAs through multiple kernel learning, the first is to extract features from different sources of the transcript, and then the calculation method based on the multiple kernel learning fuses these heterogeneous features together [56]. For WebCircRNA,Pan et al. extracts different features such as sequence composition of e.g. RNA secondary structure and conservation to train random forest model, and five-fold cross-validation to assess the performance [57].CircLGB is a machine-learning framework that includes a Classifier named LightGBM, which has 5 major procedures: (1) Collecting human circRNAs and lncRNAs transcripts, which are combined to construct the circlncRNA dataset; (2) Extracting four groups of sequence-derived features; (3) Sorting the extracted features according to the importance score, and using sequential forward search (SFS) to determine the optimal feature subset; (4) Training the LightGBM classifier for circRNA identification with the optimal feature subset; and (5) Calculating the performance metrics for model evaluation [58]. Niu develop a new classifier, CirRNAPL, which extracts the features of nucleic acid composition and structure of the circRNA sequence to optimize the extreme learning machine (ELM), based on the particle swarm optimization algorithm; ELM algorithm randomly assigns input weights and hidden layer thresholds and directly calculates output layer weights by least squares, PSO is used to optimize the input weight and hidden layer deviation of ELM, which can improve the generalizability of the methods [59]. DeepciRGO is constructed using the dependencies between GO classes as background information to predict circRNA functions by integrating multiple interactions and associations; the first step is to extract the topological information of each node in the global network as its feature; and then build a neural network for each GO, consider the functional dependencies between the classes in GO [60]. Due to the relatively low expression of CircRNAs, the specific strategies of CircRNAs detection tools also differ greatly in the results. In addition, the level of circular RNA in all cell lines is very different, which indicates that the expression level of circRNA in different cell lines has different regulatory modes. However, the machine learning methods based on sequence feature only focus on modeling the expression status of circular RNA, and do not consider the expression level of circular RNA. Therefore, the machine learning methods still have more room for development in the study of circular RNA. For example, chen et al. model circRNA expression patterns using RNA-seq data as well as integrated sequence and epigenetic features to demonstrate the potential involvement of H3K79me2 in circRNA expression [61]. Integrating sequence features with RNA sequencing data may achieve a more comprehensive understanding of circular RNA in subsequent research.

INFLUENCE OF circRNAs ON DISEASES

The human genome can be widely transcribed to generate a large number of non-coding RNAs, such as miRNA, lncRNA, piRNA, circRNA, and others. Many of these are closely related to the occurrence and development of diseases [62]. Recent studies have shown that circRNAs play an important role in diseases, such as cardiovascular diseases, diabetes mellitus, cancer, and plant-related diseases such as potato bacterial soft rot [27].

Association Between CircRNAs and Diseases

Based on studies of the structure and function of circRNAs, researchers found that they play a very important role in atherosclerosis, nervous system disorders, diabetes mellitus, and cancer [2]. The antisense noncoding RNA in the ring INK4 locus is the antisense transcript of cyclin-dependent kinase 4 inhibitor protein and its gene variable reading frame fINK4A/ARF1 [63]. SNP can affect cANIL production, thus affecting cANR IL splicing. In turn, cANR IL can affect the inhibition of the INK4a/ARF gene that is mediated by the multi comb family of proteins, therefore promoting the risk of atherosclerosis [64]. Expression of the CircRNA CDRlas was first identified in brain tissue 20 years ago, and the CDRl gene was once considered as one of the two important genes involved in autoimmune nervous system disorder [65]. CDRlas ability to recruit and interact with miR-7 is the most important basis for CDRlas involvement in the pathogenesis of diseases. It can indirectly regulate the expression of miR-7 targets, thus affecting the occurrence and development of diseases. In an Alzheimer's disease study, it was found that there was maladjustment of the miRNA ring RNA system in the hippocampal CA1 region. The loss of CiRS-7 decreased the activity of “miRNA sponge” and increased the miR-7 level around brain cells, which is one of the mechanisms associated with AD pathogenesis [66]. MiR-7 can directly regulate the α-synuclein expression and play a role in the pathogenesis of Parkinson's disease [67]. In pancreatic β cells, miR-7 inhibitors can induce the subsequent stimulation effect of the mammalian rapamycin target protein signaling pathway in cell proliferation, indicating that miR-7 may be a therapeutic target for diabetes mellitus [68]. Other studies have shown that miR-7 acts as a tumor suppressor, and that its dysfunction and low expression participate in the occurrence and development of different malignant tumors, including breast, liver, and cervical cancers [69]. In addition, it can inhibit the growth of the A549 human non-small cell lung cancer cells by regulating the expression of the apoptotic-related gene β-lymphoma 2 [70]. To reveal the relationship between circRNA and aging, we studied the expression of the ring RNA in different tissues and organs of mice at different growth stages. In the mammary gland, only circUSP3 expression negatively correlated with aging, while in the intestine, the three types of growth related circRNAs, positively correlated. This suggests that circRNAs may be involved in different senescence pathways and play different biological roles [71]. CircRNAs affect plant growth. In kiwifruit, it was found that circRNAs have a specific response to pathogen invasion [72]. In potatoes, differentially expressed circRNAs responded to bacterial infections producing Botox [73]. Several specific circRNAs were found in tomato leaves infected with yellow leaf curl virus disease [74]. There was also a differential expression of circRNAs in tomato fruits during their development, which indicated that circRNAs could be involved in fruit ripening and coloration [75]. At present, we still do not have a complete understanding of circRNA’s functions in health and diseases. Their biological functions and clinical values need to be further explored, and it is undeniable that the research on circRNAs has a bright future for the prevention and treatment of diseases.

Prediction Method of Diseases Associated circRNA

Benefiting from the prediction of association between diseases and circRNAs, many diseases have now been confirmed to be related to circRNAs. Because circRNAs can adsorb miRNAs in cells and act as sponges for miRNAs, it is possible to construct a heterogeneous information network between circRNAs, miRNAs, and diseases to explore their relationships. For heterogeneous network construction, a circRNA co-expression network is first constructed by using the scale-free characteristics of the ENCODE data set and biological network. In addition, miRNA functional similarity network was constructed by using the data from miRNA target gene database. The miRNA functional similarity network was verified, and the disease-related miRNA and human disease network were obtained after cleaning and analysis of the data, using existing databases and the literature. In the second step, using the Meta path-based approach to get the topological features from the circRNA to the disease and using path count and random walk to measure the correlation degree of different nodes, the different network topology characteristics from circRNA to disease were obtained. Finally, using the machine learning methods (Katz method, single class support vector machine method, category weighting strategy, and bagging ensemble learning strategy), the association between circRNA and disease was verified [76]. Wang et al. used a deep convolutional neural network method based on multi-source information to predict the relationship between circRNA and disease. In this method, it was first assumed that similar circRNAs are usually associated with diseases with similar phenotypes. Based on this assumption, a digital descriptor using circRNA similarity network and disease semantic similarity network was constructed. Then, using circRNA and disease’s biological information, the Gaussian interaction contour kernel similarity network was introduced into the descriptor. The multi-source information such as disease semantic similarity, disease Gaussian interaction contour kernel similarity, and circular Gaussian interaction contour kernel similarity were then fused. Next, the deep learning convolutional neural network (CNN) algorithm was used to automatically and objectively extract the deep features of the circRNA disease descriptor. Finally, the Extreme Learning Machine classifier was used to quickly and accurately predict the potential circRNA disease association [77]. Another way to determine the correlation between circRNA and disease was based on matrix factorization. First, the potential circRNA disease association was calculated from the circRNA similarity and disease similarity extracted from the disease semantic information. Known associations of circRNA, genetic disease, and diseases associated with circRNA were also extracted. Then, the circRNA disease interaction spectrum was updated by the adjacent interaction spectrum to correct for false-negative association. Finally, the updated circRNA disease interaction spectrum was decomposed by matrix factor to predict the correlation between circRNA and disease [78].

DISCUSSION AND PROSPECT

Based on the in-depth study of next-generation sequencing data and the utilization of machine learning methods, our comprehension of circRNAs has gradually deepened. In this review, we briefly describe the biogenesis and functions of circRNAs. In addition, two important strategies for the identification of circRNAs are introduced: sequence feature and RNA next-generation sequencing data. Next-generation sequencing data technology has prompted a variety of identification software, and DNA sequence features have been rapidly applied to the identification of circRNAs, which have become a new hot spot in bioinformatics. With the development of technologies for the identification of circRNAs, the relationship between circRNAs and diseases gradually became apparent. From the discovery of circRNA in plants’ viruses in 1976, to human tumors and cell senescence, and to plant soft rot disease and drought response, increasing knowledge about circRNAs as disease factors has been gathered. In addition, algorithms for the identification of disease-related circRNAs will greatly benefit contemporary medicine as correlations between circRNAs and diseases are gradually discovered. Although significant progress has been made in the study of circRNA, there are still many problems to be solved. There is no unified biogenic mechanism to explain the biogenesis of all the discovered circRNAs [53, 79]. Algorithms used for the prediction of circRNAs generate results with a large deviation, and there is no unified standard to evaluate the accuracy of these results. Therefore, it is still difficult to detect circRNAs in the genome with sufficient sensitivity and specificity [80]. The mechanism of degradation of circular RNA is even less understood than their biogenesis mechanism. How are circRNAs degraded to maintain a balance in organisms [27]? In recent years, many studies have been conducted on human circRNAs, but much less work has been done on the mechanisms of formation, localization and degradation of plant circRNAs. In addition, specific identification tools for plant circRNAs are lacking.

CONCLUSION

In recent years, studies on plant growth and development, abiotic stress, ncRNAs, and regulatory networks have become more specific and documented. Prediction algorithms based on experimental techniques and experimental ideas will be developed and improved. Due to the stability, tissue specificity, and high expression levels of circRNAs, a variety of known circRNAs have been associated with diseases, making them more likely to become useful biomarkers and therapeutic targets in human diseases. In addition, circRNAs are modulated by environmental stimuli, causing a variety of diseases. Therefore, circRNAs could become important biomarkers to study the relationship between environmental stimuli and induction of human diseases. Today, the relationships between circRNAs and the pathogenesis of diseases are still unclear, but we believe that in the near future, their functions and relationships with diseases will be more thoroughly understood [81].

Table 1

Detection methods of circRNAs based on sequencing data.

Method	Approach	Dataset or Database	Language	URL
Find-circ [9]	Find-circ takes two 20 bp of the reads that are not mapped and map them to the genome again. Next, the GU/AG cleavage site is found by short sequence alignment to infer the potential circular RNA sequence.	human hg19,mouse mm9,C. elegans ce6,UCSC genome browser	python	https://www.nature.com/articles/nature11928/
CIRI [37]	The fasta sequence of the input genome is compared with the same file generated by the sequencing data to detect junction reads: paired chiastic clipping signals at the junction points of the covered circular RNA. Compare junction reads. The conservative splicing sites of PEM and GT-AG are filtered, and the dynamic programming algorithm performs the final circular RNA prediction.	ENCODE RNA-seq data	perl	https://doi.org/10.1186/s13059-014-0571-3
DCC [38]	DCC uses the output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.	rRNA-depleted total RNA-seq data	python	https://doi.org/10.1093/bioinformatics/btv656
KNIFE [39]	Detect and quantify circular and linear RNA splicing events at the annotated and unannotated exon boundaries, including intergenic regions of the genome, thereby improving the sensitivity and specificity of circular RNA detection.	ENCODE poly(A) + and poly(A) - RNA-Seq data	Python/perl	https://link.springer.com/article/10.1186/s13059-015-0690-5
CIRC explorer [40]	Identify the linker reads from the reverse splicing exons, realign the connected reads with the existing gene annotations, determine the precise locations of the splicing sites of the downstream donor and upstream acceptor, and use a custom algorithm to adjust the mapping errors based on RefSeq exon annotations	RNase R- treated RNA-seq from H9 human embryonic stem cells (hESCs)	python	https://doi.org/10.1016/j.cell.2014.09.001
UROBORUS [41]	The artificial paired-end seed (20 bp) is first extracted from two ends of reads in an unmapped.sam file, and then aligned to the reference genome. The UROBORUS pipeline designed algorithm to deal with BMJ and UMJ reads, and detect more circRNA supported reads.	RNA-seq data of glioma samples in Hg19	perl	https://doi.org/10.1093/nar/gkw075
MAP splice [42]	In the ‘tag alignment’ phase, candidate alignments of themRNA tags to the reference genome G are determined. In the ‘splice inference phase,’ splice junctions that appear in the alignments of one or more tags are analyzed to determine a splice significance score based on the quality and diversity of alignments that include the splice.	a synthetic noise-free RNA-seq data	python	https://doi.org/10.1093/nar/gkq622
Segemehl [43]	Segemehl is a single-end RNA-seq data segmentation read mapping algorithm, which combines seed mapping based on error ESA and fast bit vector comparison. It can accommodate multiple splits in one read and does not make a priori assumptions about the transcript structure. It is implemented in the Segemehl mapping tool, which can easily identify conventional splice junctions, collinear and non-collinear fusion transcripts, and trans-spliced RNA	RefSeq database Drosophila RNA-seq dataset human melanoma transcriptome dataset	python	https://link.springer.com/article/10.1186/gb-2014-15-2-r34
NCL scan [44]	Map RNA sequence reads withreference genomes and knowntranscripts, eliminate collinearity matching reads, and connect the two ends of each unmapp-ed read to generate a continuo-us sequence. Use BLAT to aligneach linked sequence with thereference genome. According to the corresponding BLATcomparison results andGENCODE comments, use theassumed NCL connection points to make a “hypothetical NCLreference”	hg19/GRCh37	python	https://doi.org/10.1093/nar/gkv1013
PTES Finder [45]	PTES Finder identifies putativePTES structures by mapping RNAseq reads to sequence modelsgenerated using existingtranscript annotation. It thenapplies a series of mapping and alignment filters tosystematically remove known classes of false positives.	sample SRR4497 5A in human fibroblast total RNA	perl	http://link.springer.com/article/10.1186/s12859-016-0949-1
CircRNA-finder [46]	STAR compares the reference genome and runs circRNA-finder to get the circular RNA file whose splicing site meets the GT-AG splicing signal	Drosophila total RNA-sequencing data	perl	https://doi.org/10.1016/j.celrep.2014.10.062
CircRNAFisher [47]	CircRNAFisher is a systematic calculation pipeline suitable for whole-genome circRNA identification and annotation from scratch. CircRNAFisher combines BSJ search with a series of statistical filters to detect candidate circRNAs. It can also combine BSJ overlapping reading fragments with inconsistent BSJ Read the fragment to estimate the P value of the identified circRNA	A549 and MCF7 in RNA-Seq data from the ENCODEproject;human hg19 and UCSC database	perl	https://www.nature.com/articles/s41401-018-0063-1
PcircRNA-finder [48]	PcircRNA-finder collects all backsplice sites by chiastic clipping mapping of PE reads based on available main fusion detection methods.	rRNA-/RNAase R RNA-Seq data	python	https://doi.org/10.1093/bioinformatics/btw496

Table 2

Sequence feature approaches for circRNA classification

Method	Approach	Dataset orDatabase	Language	URL
Context regressionModel [53]	The features are inputted into a neural network that generates a contextual weight for each feature representing its importance. Features are then organized by corresponding weights to make an easier separation of samples. In classification or regression tasks, the weighted features are then summed to yield the prediction.	CircNet	python	https://doi.org/10.1093/bioinformatics/btz705
Helm [54]	The goal of HELM is to learn the sparse encoder in an unsupervised way and transform the original input into a higher-level representation. The random disturbance of the output matrix is the input of the supervised feature classification. There is only one hidden layer for supervised feature classification. The goal is to minimize the norm of training error and output weight.	CircbaseandGENCODE	python	https://doi.org/10.1007/s00438-017-1372-7
Circdeep [55]	Constructing RCM descriptors that provide the possibility of circularization of flanking sequences and query sequences, integrating ACNN and BLSTM to process different input data types. Integrate shared representations among different modes by fusion of RCM descriptors, ACNN-BLSTM sequence descriptors, and conservation descriptors into high-level abstract descriptorsAdd feature fusion learning to heterogeneous descriptors, train a single DNN, and finally predict circRNA.	GENCODEand human circRNAs from the database circRNADB	python	https://doi.org/10.1093/bioinformatics/btz537
PredcircRNA [56]	PredcircRNA focuses on distinguishing circRNAs from other lncRNAs through multiple kernel learning. First, by distinguishing features from different sources extracted from the transcript, and second, by a calculation method based on a multiple kernel learning framework, used to fuse these heterogeneous features. Finally, perform 5-fold cross-validation.	human circRNA data from circbase databaseandGENCODE	python	https://doi.org/10.1039/c5mb00214a
WebCircRNA [57]	WebCircRNA is a user-friendly web server that predicts if coding and noncoding RNAs have circRNA isoforms and whether circRNAs are expressed in stem cells. The predictions are made by random forest models using sequence-derived features as input. The output scores are converted to fractiles, which are used to assess the circRNA and stem cell potential.	CircBase and GENCODE v19	python	https://doi.org/10.3390/genes9110536
CircLGB [58]	CircLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. CircLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection.	CircBase, LNCipedia, CircRNADb and GENCODE	python	https://doi.org/10.3389/fgene.2020.00655
CirRNAPL [59]	First, the method extracts the features of nucleic acid composition and structure of the circRNA sequence. Four features of the sequence data were extracted, including Ribonucleic acid composition, Autocorrelation, Pseudo-ribonucleic acid composition, and Predicted structure composition. Then, the extreme learning machine based on particle swarm optimization is used as the classification algorithm. The classifier CirRNAPL is constructed by a tenfold cross-validation method to identify the RNA sequence to be labeled.	CircBase and GENCODE v19	Java	https://doi.org/10.1016/j.csbj.2020.03.028
DeepciRGO [ 60 ]	DeepciRGO predicts the function of circRNA by integrating multiple biological information related to circRNA. Firstly, constructing a global heterogeneous network according to circRNA co-expressions, circRNA-protein associations, and protein-protein interactions. Then, the latent topo-logical features of the global network are extracted through HIN2Vec and are further fed into a deep neural network classifier. Finally, circRNAs are annotated with GO terms through the trained classifier.	StarBase v2.0 and CSCD	python	https://doi.org/10.1186/s12859-020-03748-3

77 in total

Review 1. Regulation of circRNA biogenesis.

Authors: Ling-Ling Chen; Li Yang
Journal: RNA Biol Date: 2015 Impact factor: 4.652

Review 2. [Biogenesis, research methods, and functions of circular RNAs].

Authors: Xu Qing Liu; Yu Bang Gao; Liang Zhen Zhao; Yu Chen Cai; Hui Yuan Wang; Miao Miao; Lian Feng Gu; Hang Xiao Zhang
Journal: Yi Chuan Date: 2019-06-20

3. CircRNAFisher: a systematic computational approach for de novo circular RNA identification.

Authors: Guo-Yi Jia; Duo-Lin Wang; Meng-Zhu Xue; Yu-Wei Liu; Yu-Chen Pei; Ying-Qun Yang; Jing-Mei Xu; Yan-Chun Liang; Peng Wang
Journal: Acta Pharmacol Sin Date: 2018-07-16 Impact factor: 6.150

4. Translation of CircRNAs.

Authors: Nagarjuna Reddy Pamudurti; Osnat Bartok; Marvin Jens; Reut Ashwal-Fluss; Christin Stottmeister; Larissa Ruhe; Mor Hanan; Emanuel Wyler; Daniel Perez-Hernandez; Evelyn Ramberger; Shlomo Shenzis; Moshe Samson; Gunnar Dittmar; Markus Landthaler; Marina Chekulaeva; Nikolaus Rajewsky; Sebastian Kadener
Journal: Mol Cell Date: 2017-03-23 Impact factor: 17.970

Review 5. A comprehensive overview and evaluation of circular RNA detection tools.

Authors: Xiangxiang Zeng; Wei Lin; Maozu Guo; Quan Zou
Journal: PLoS Comput Biol Date: 2017-06-08 Impact factor: 4.475

6. MVSC: A Multi-variation Simulator of Cancer Genome.

Authors: Ning Li; Jialiang Yang; Wen Zhu; Ying Liang
Journal: Comb Chem High Throughput Screen Date: 2020 Impact factor: 1.339

7. NCLscan: accurate identification of non-co-linear transcripts (fusion, trans-splicing and circular RNA) with a good balance between sensitivity and precision.

Authors: Trees-Juen Chuang; Chan-Shuo Wu; Chia-Ying Chen; Li-Yuan Hung; Tai-Wei Chiang; Min-Yu Yang
Journal: Nucleic Acids Res Date: 2015-10-05 Impact factor: 16.971

8. Diminished parkin solubility and co-localization with intraneuronal amyloid-β are associated with autophagic defects in Alzheimer's disease.

Authors: Irina Lonskaya; Ashot R Shekoyan; Michaeline L Hebron; Nicole Desforges; Norah K Algarzae; Charbel E-H Moussa
Journal: J Alzheimers Dis Date: 2013 Impact factor: 4.472