Literature DB >> 29039477

Comprehensive analysis of circRNA expression profiles in humans by RAISE.

Lin Li¹, Yong-Chang Zheng², Masood Ur Rehman Kayani¹, Wen Xu¹, Guan-Qun Wang¹, Pei Sun¹, Ning Ao³, Li-Na Zhang⁴, Zhao-Qi Gu², Liang-Cai Wu², Hai-Tao Zhao².

Abstract

Circular RNAs (circRNAs) are pervasively expressed circles of non‑coding RNAs. Even though many circRNAs have been reported in humans, their expression patterns and functions remain poorly understood. In this study, we employed a pipeline named RAISE to detect circRNAs in RNA‑seq data. RAISE can fully characterize circRNA structure and abundance. We evaluated inter-individual variations in circRNA expression in humans by applying this pipeline to numerous non‑poly(A)-selected RNA‑seq data. We identified 59,128 circRNA candidates in 61 human liver samples, with almost no overlap in the circRNA of the recruited samples. Approximately 89% of the circRNAs were detected in one or two samples. In comparison, 10% of the linear mRNAs and non‑coding RNAs were detected in each sample. We estimated the variation in other tissues, especially the circRNA high-abundance tissues, in advance. Only 0.5% of the 50,631 brain circRNA candidates were shared among the 30 recruited brain samples, which is similar to the proportion in liver. Moreover, we found inter- and intra-individual diversity in circRNAs expression in the granulocyte RNA‑seq data from seven individuals sampled 3 times at one-month intervals. Our findings suggest that careful consideration of inter-individual diversity is required when extensively identifying human circRNAs or proposing their use as potential biomarkers and therapeutic targets in disease.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2017 PMID： 29039477 PMCID： PMC5673025 DOI： 10.3892/ijo.2017.4162

Source DB: PubMed Journal: Int J Oncol ISSN： 1019-6439 Impact factor: 5.650

Introduction

Circular RNAs (circRNAs) are a recently rediscovered class of non-coding RNA (1,2). They are formed by backsplicing events that involve a downstream 3' splice donor site joining an upstream 5' splice acceptor site in the primary transcript. Since the discovery of the first two circular RNAs (DCC in humans and SRY in mice) in the 1990s (3,4), numerous circRNAs have been identified in silico and validated in experiments (1,2). CircRNAs are remarkably stable, conserved, highly abundant and predominantly cytoplasmic (2). They are generated through several distinct mechanisms that rely on complementary sequences within flanking introns (2,5,6), exon skipping (6,7), and exon-containing lariat precursors (8). CircRNA expression is regulated by an RNA editing enzyme or RNA binding proteins, such as ADAR (6) and Quaking (9). Similar to linear RNA, circRNAs are generated from exons or introns at canonical splice sites and require typical spliceosomal machinery (10–12). Computational biologists have developed several alignment algorithms to identify circRNAs using RNA-seq data. Two main approaches are used to detect circRNAs. One approach uses the annotated genome to build a reference scrambled exon-exon junction database. The scrambled exome includes all possible pairs of intragenic exons in a non-canonical order and the circularization of a single exon. The backsplice junction reads are aligned contiguously along their full length to databases, including KNIFE and other pipelines (13,14). The second strategy improves the alignment algorithms and the pipeline, identifying the backsplice aligned reads to the genome or transcriptome, and examples include mapsplice, find_circ, segemehl, circExplorer, circRNA_finder, CIRI, DCC and acfs (5,15–21). These algorithms differ in accuracy and sensitivity, and there is little overlap in their predictions (22). Several studies have revealed that circRNAs are substantially enriched in the brain tissues of humans and mice. Of note, the expression levels of circRNAs are dynamic during brain development and are independent of the linear transcript that originates from the same gene locus (23–25). In epithelial ovarian carcinoma, circRNAs display altered expression patterns between primary ovarian tumors and metastatic tumors (14). In heart-specific circRNA candidates, there is a lack of differential expression of circRNAs between normal and diseased human heart (26). CircRNAs may also have an impact on aging and multiple disorders. CDR1as (ciRS-7) can serve as a miRNA 'sponge', arresting miR-7 function. In addition, miR-7 is a vital regulatory miRNA in Parkinson's disease (16). Genome-wide association studies linked a newly identified circRNA species, called cANRIL, with atherosclerosis risk (27). To systematically investigate the intra- and inter-individual variation in circRNA expression profiles and the role of circRNAs in humans, we collected a large set of total RNA-seq data from NCBI Sequence Read Archive and ENCODE (28,29). A pipeline named RAISE (circRNA ReAlign Internal Structure and Expression) was developed to analyze circRNA candidates in these samples. Using RAISE, we identified 59,128 circRNA candidates in HCC and adjacent non-tumor tissues. Only a small portion of circRNAs is universally expressed in the recruited HCC samples. The expression of circRNAs in HCC shows inter-individual variations. In advance, we estimated whether circRNA expression varies in other tissues, especially in circRNA high-abundance tissues. Similar to liver cases, only 0.5% of the 50,631 brain circRNA candidates are shared among the 30 recruited brain samples. Moreover, we found inter- and intra-individual diversity in circRNA expression in granulocyte RNA-seq data from seven individuals sampled 3 times at one-month intervals. Our results suggest that the majority of circRNAs exhibit inter-and intra-individual variations. When proposing variable circRNAs that are naturally highly expressed as prognostic markers, it is necessary to collect a sufficient number of individuals to confirm that these circRNAs are robustly expressed in humans.

Materials and methods

RNA-seq datasets

We downloaded publicly available human and mouse RNA-seq data set samples from NCBI SRA (28) and ENCODE (29). Human samples included hepatocellular carcinoma (accession no. GSE65485) (30), granulocyte samples (accession no. GSE70390) (31), brain (accession nos. GSE53697 and GSE71315) (32,33). Mouse samples included liver and brain (accession no. PRJEB5489) (Table I).

Table I

Summary of RNA-seq data set.

Accession no.	Liver	Brain	Granulocytes	Other	Total	Description
GSE65485	55	0	0	0	55	50 HCC and 5 adjacent non-tumor
GSE77661	4	2	0	20	26	In 4 liver samples, 2 are normal, 1 is HCC, 1 is HCC adjacent non-tumor tissue, 2 heart
GSE73570	0	0	0	6	6	Blood 1 repeat
encode	2	3	0	38	43	2 liver sample, 3 brain samples
GSE53697	0	17	0	0	17	8 control, 9 Alzheimer's diseased
GSE71315		16	0	0	16	8 ribozero total RNA, 8 polyA⁺ RNA
PRJEB5489	12	12	0	0	24	Mouse normal
GSE70390	0	0	21	0	21	Human granulocytes 7x3

RAISE: the workflow of circRNA identification

RAISE is a pipeline designed as a shell script to run after circRNA backsplice sites have been identified. This pipeline can identify circRNA internal structures and alternative exon usage. In this study, we combined four circRNA prediction algorithms, including mapsplice (2,15,34), find_circ (16), acfs (20,24), and circRNA_finder (18). The main steps of RAISE are briefly described in Fig. 1. The first step is to obtain the unmapped reads. RNA-seq reads were aligned to the reference genome and transcriptome by hisat2, filtering contiguous and canonical splice reads. Meanwhile, circRNA_finder was employed to identify circRNAs. The second step is applying mapsplice, acfs and find_circ to detect circRNAs from unmapped reads by combining the three tools and the circRNA_finder results. The third step is to extract the genomic region sequence of the circRNA backsplice site, tandemly duplicating this sequence to create a pseudo circRNA reference. The fourth step is to obtain backsplice junction site coverage and depth. The step 1 unmapped reads are realigned to the pseudo circRNA reference by hisat2. The fifth step is conducting paired-end read analysis with one read aligned to the backsplice site. If the mate reads are mapped to the circRNA region, they could be circRNA paired-end reads. If the mate reads are mapped out to a range of circRNA regions, they could be decoys. Then, the proper mapped paired-end reads and decoy reads were counted. The sixth step is to predict the circRNA transcript based on the proper mapped paired-end reads. With one read aligned to the backsplice site, the mate alignment read contains the circRNA inner splice site, which is used to detect circRNA internal structure and exon usage. The alternative step is to detect the support of circRNAs paired-end reads. There are several linear discordant paired-end alignment reads within the genomic region of the circRNA backsplice site that could be the potentially supported reads of circRNA candidates (Fig. 1). This pipeline is available as git repository: https://github.com/liaoscience/RAISE.

Figure 1

The RAISE workflow for the identification and quantification of circRNA candidates. (A) The pipeline of circRNA detection: 1, Hisat2 alignment and filtering of mapped reads, STAR alignment and circRNA_finder identification. 2, Detection of circRNAs from unmapped reads using mapsplice, acfs and find_circ tools, and merger of their outputs with the circRNA_finder results. 3, Extraction of the genomic region corresponding to the circRNA backsplice site from the previous four tools' output, tandem duplication of the sequences to create a pseudo circRNA reference, realignment of the unmapped reads to the pseudo reference, and estimation of the abundance of circRNA candidates. 4, From the paired-end data, if one read is aligned to a circRNA backsplice site and if the mate read is aligned to the circRNA region, these paired-end reads are classified among circRNA reads, whereas if the mate read is aligned outside the circRNA backsplice site, the paired-reads are classified as decoys. (B) Prediction of circRNA internal exon usage by paired-end and splice junction reads. (C) Discordant linear alignment reads that could be potential circRNA candidate paired-end reads.

CircRNA annotation and quantification

CircRNA annotation is based on the Gencode (35) human genome (v38) and mouse genome (v10). We intersected the circRNA donator/acceptor site to annotate gene regions, including coding RNA, non-coding RNA, intron, antisense and intergenic regions, with the BEDTools suite (v2.16.2) (36). The number of reads aligned to the circRNA-specific head-to-tail junctions was used as a measurement of circRNAs expression. Normalization with circRNA Spliced backsplice Reads Per Billion Mapped Reads (SRPBM) in each library was performed to enable the comparison of relative expression among samples (37). Circular to linear ratios were calculated using the backsplice reads of circRNAs by dividing the mean value of reads that span the linear splice junction reads, including the left and right sides of the circRNA splice sites (38).

Mapping and quantification of linear mRNA and lncRNA expression

Sequencing quality was assessed by FASTQC (39). After removing adaptor and low-quality reads using cutadapt (40) (-q 10 -e 0.1 -O 10 -m 50), the clean reads were aligned to the human (hg38) genome reference sequences using hisat2 (41) with the default parameters. The bam files were generated, sorted, and deduplicated using SAMTOOLS (v1.3) (42). Read counts were tabulated with HT-Seq (43) in 'union' mode with the Gencode human v24 GTF file as a reference. Stringtie (v1.3) (44) was also used to estimate the total transcriptional output based on the Gencode human gene annotation (HG38 version 24) (35).

Gene Ontology enrichment analysis

Gene ontology (GO) term enrichment analysis was performed using DAVID (45), by inputting the list of circRNA derived host locus genes.

Statistical analysis

The raw counts were first normalized using trimmed mean of M-values (TMM). Differential circRNA or gene expression was estimated using the edgeR package, and a negative binomial model was used to estimate differential expression between tumor and adjacent non-tumor tissues (FDR <0.02, 2-fold change) (46). Statistical analyses were performed using R 3.3.1 (http://www.r-project.org/).

Results

RAISE: a cocktail of circRNA analysis pipeline

Numerous computational pipelines use backsplice reads to identify circRNAs; however, there is little overlap in these circRNA detection methods. Each algorithm has bias and 'blind spots', so we combined several different read aligners to identify more circRNAs and increase the robustness of circRNA identification (22). Previous algorithms detect circRNA backsplice sites, but they do not include internal structure information. We combined four available circRNA detection algorithms (mapsplice, acfs, circRNA_finder, and find_circ) to develop an integrated pipeline called RAISE to improve the prediction accuracy and detect the internal exon usage of circRNAs (Fig. 1). RAISE is an easy-to-use shell script pipeline. The four selected tools chosen are based on previous reviews and research (22,34). We tested RAISE on human liver rRNA depleted samples from ENCODE. In this RNA-seq library, 5,977 circRNA candidates were detected by mapsplice, 6,672 circRNA candidates by acfs, 10,778 circRNA candidates by circRNA_finder, and 8,952 circRNA candidates by find_circ. There were 2,891 circRNA candidates that were detected by all four tools. After application of the RAISE pipeline, 14,145 circRNA candidates were detected, and when we filtered out candidates with less than two backsplice reads, there were 8,270 circRNA candidates for advanced analysis. We compared the abundance of circRNA candidates between these tools and RAISE. The abundance in acfs and mapsplice was close to that of RAISE, whereas the abundance in find_circ and circRNA_finder was less than that in RAISE (Fig. 2A and B). Furthermore, the internal exon usage of 3,052 high-abundance circRNA candidates was predicted. For example, the exon composition of circABCB4 contains exon 13 and exon 14 of the ABCB transcript, with one read aligned to the backsplice sites and the mate alignment read containing cis-junction splice information, which is consistent with the linear RNA junction site (Fig. 2C). A total of 1,255 alternative splice sites were also detected in the library; for example, circPLOD2 was derived from exons 2 and 3 of the PLOD2 gene, and there were two cis-splice junctions in intron 2 (Fig. 2D). Exon usage was different with the host locus linear RNAs.

Figure 2

Summary of RAISE prediction results. (A) Venn diagram comparison of circRNAs identified by the four tools. (B) Scatter plot of circRNA abundance between RAISE and the four tools. (C) The internal structure is the same as that of linear RNAs. (D) The internal structure is different from that of linear RNAs. CircRNA internal structure displayed by IGV. Red and blue bold lines indicate exons, a curved red line indicates the cis-splice junction of the circRNA, a curved black line indicates the backsplice junction of the circRNA, gray and green lines indicate the paired-end alignment reads, and gray peaks indicate the read depth in the genomic region.

CircRNAs display only minor alterations in expression in HCC and adjacent non-tumor tissue

In order to investigate the circRNA expression profile in HCC, we collected 61 human liver samples. All of the RNA-seq data are non-poly(A)-selected, and are downloaded from NCBI and ENCODE, GSE65485 included 50 HCC samples and 5 adjacent non-tumor tissues; GSE77661 included 4 liver samples; ENCODE included 2 liver samples (Table I). These data included 51 HCC samples, 6 adjacent non-tumor samples and 4 normal liver samples. The sequencing depths of these samples ranged from 29.3 to 122.3 million reads. Approximately 95% of these reads were aligned to the human reference genome (hg38). Multidimensional scaling (MDS) (47) analysis showed that the HCC samples were distinct from normal and adjacent non-cancerous tissue samples (Fig. 3A).

Figure 3

Multidimensional scaling (MDS) analysis of the 55 HCC samples. (A) Linear transcriptome. (B) CircRNAs transcriptome. (C) A heatmap displaying the differentially expressed linear gene abundance in 12 HCC samples.

The RAISE pipeline was applied on the unmapped reads (5%), and 59,128 distinct circRNA candidates were identified in 61 liver samples. We chose seven HCC samples and five adjacent non-tumor samples (unpaired samples) to compare the expression patterns of circRNAs between tumor and adjacent non-tumor tissues. First, we found that 80% of the circRNA candidates in these 12 HCC samples were derived from the protein-coding exonic regions while other smaller fractions were antisense, long non-coding RNAs, intergenic regions and intronic regions (Fig. 4A). The genomic features of these circRNAs, e.g., genomic origins, exon numbers, exon length and genomic distance, were compared between HCC and adjacent non-tumor tissue. The exon numbers of most circRNAs were less than five. The length of most exonic circRNAs was ~300–500 nt with a genomic distance of ~1,000–3,000 bp. There was no significant difference in the genomic features between HCC and adjacent non-tumor tissue (Fig. 4B–D).

Figure 4

CircRNAs display no difference between HCC and adjacent non-tumor tissue. (A) The genomic origin of circRNAs. (B) The exon number distribution of circRNAs. (C) The genomic distance of backsplice sites. (D) The length distribution of exonic circRNAs. Tumor represents HCC. Normal represents adjacent non-tumor tissue. (E) Heatmap of several circRNA expression profiles from the samples. N represents adjacent non-tumor tissue. T represents HCC.

Next, we inquired whether liver circRNAs are differentially expressed between tumor and adjacent non-tumor tissues. Unlike mRNAs and lncRNAs, hundreds of linear RNA genes are significantly differentially expressed in these two groups. We did not detect any differentially expressed circRNA candidates with statistical significance (Fig. 4E). The circRNA circALB (chr4:73405119-73408712) from exon 2–4 of the ALB gene was in high abundance in adjacent non-tumor tissue, but it was not detected in 84% of a total of 50 HCC samples. Of note, ALB mRNA was highly abundant in the liver samples.

CircRNAs show inter- and intra-individual expression diversity

In addition to assessing circRNA expression patterns in HCC, we turned our attention to the liver-specific expression of circRNAs. We calculated the occurrence of each circRNA in these samples to discover the shared circRNAs and unique circRNAs. For example, circAPOA2 was detected in 28 HCC samples; hence, its occurrence was 28. In 55 HCC samples, the occurrence of circRNAs ranged from 1 to 55. We analyzed the total detected circRNA candidates and linear gene occurrences. Unlike linear RNAs, ~10% of the protein-coding RNAs and non-coding RNAs were expressed in all 55 HCC samples, whereas almost no detected circRNAs were shared by the 55 samples (Fig. 5A). Consequently, a single gene locus can transcribe multiple circular isoforms (48,49). We asked whether the diversity among the samples is due to variations in circular RNA isoform selection within the gene locus. We set the circRNAs derived from the same transcript as 'transcript circRNA' and those from the same gene locus as 'gene circRNA'. For example, there are 329 circRNA isoforms in the ALB gene locus; we set these circRNAs as circALB. Furthermore, we compared the diversity in gene-and transcript-level circular RNA expression. Seven of the 16,133 transcript circRNAs and 9 of the 9,696 gene circRNAs were shared among the 55 HCC samples (data not shown). These nine genes expressed circRNAs in all the samples independently of the circRNA backsplice sites. The ratio of transcript to gene circRNAs remained low in contrast to the ratio of transcript to linear RNAs. To test whether a similar expression pattern exists in mice, we downloaded 12 mouse liver RNA-seq datasets from NCBI SRA, analyzed them with the same pipeline (Table I), and identified 3,801 circRNA candidates in these samples. Only 0.1% of circRNAs were shared in the mouse liver samples, i.e., 0.1% of the transcripts as well as 0.1% of the gene circRNAs were shared in each sample. This result is consistent with the human liver circRNA expression profile (Fig. 5B).

Figure 5

Comparison of circRNA and linear RNA distributions in humans and mice. x-axis corresponds to the occurrence of RNA in the recruited samples; y-axis represents the fraction of RNAs. (A) Fifty-five human HCC samples. (B) Twelve mouse liver samples. (C) Thirty human brain samples. (D) Twelve mouse brain samples. (E) Twenty-one human granulocyte samples.

We then asked whether inter-individual circRNA expression diversity is also common in other human tissues. Since circRNAs are highly abundant in the brain (data not shown) (24), we downloaded several batches of brain ribosomal RNA depleted RNA-seq data from NCBI SRA and analyzed the brain circRNA expression profile with our pipeline (Table I). We identified 50,631 circRNA candidates in 30 human brain samples and found ~0.5% shared in each brain sample (Fig. 5C). The same pipeline was used on the mouse brain samples, and its results showed similarities to those of the human brain (Fig. 5D). Circular RNAs display greater variation than linear genes in both brain and liver. CircRNA expression profiles are varied and diverse between individuals, whether or not circRNAs are reproducibly expressed within one individual. We downloaded 21 human granulocyte ribo-zero RNA-seq datasets from NCBI SRA (Table I). The data came from seven healthy individuals, with three samples at least one month apart (31). We used the previous pipeline to identify circRNAs and found that 3% were shared among these 21 samples. Within the individuals, circRNAs showed less reproducible expression than linear RNAs between different individuals (Fig. 5E and Table II). The unsupervised hierarchical clustering of the circRNA expression profiles of the 21 samples displayed inconsistent results within each donor (Fig. 6).

Table II

CircRNA distribution in different tissues.

Specie tissue	Human brain	Mouse brain	Human liver	Mouse liver	Human granulocytes
Samples number	30	12	61	12	21
CircRNA candidates	50,631	7,124	59,128	3,801	31,063
Prediomant	5,017	516	2,743	244	3,893
Alter splice	3,042	182	5,228	55	1,528
Shared (>20%)	8,910	1,983	469	353	8,509
Shared prediomant	405	157	8	19	984
Shared alter splice	232	33	3	4	268
Unique (<20%)	41,721	5,141	58,659	3,448	22,178
Unique prediomant	4,612	359	2,735	225	2,909
Unique alter splice	2,869	156	5,224	55	1,318
Transcript circRNA	19,544	4,177	20,530	2,788	12,809
Co-exist	8,086	1,695	8,516	1,291	4,584
Shared co-exist	2,082	634	333	192	1,849

Shared circRNAs is the occurence of circRNAs in >20% of the recruited samples. Unique circRNAs is the total circRNA candidates minus the shared circRNAs. Prediomant circRNAs is in the circRNA backsplice sites, when backsplice reads of circRNAs more than the reads that span linear splice junction reads. Alter splice is the circRNAs internal structure which is different from linear RNAs. Transcript circRNA refers to the circRNAs reference transcripts. Co-exist is circRNAs and linear RNA genes co-expression.

Figure 6

CircRNA expression displays inter- and intra-individual diversity in human granulocyte samples (seven individuals, sampled at three time-points spaced at least 1 month apart). Unsupervised hierarchical clustering of the circRNAs. Heatmap colors represent relative circRNA abundance in each sample.

In short, for the tissues with low circRNA abundance, e.g., liver, the ratio of shared circRNAs was lower. For the tissues with high circRNA abundance, e.g., brain and granulocytes, the ratio increased.

Shared circRNAs are highly abundant and are derived from circRNA hotspot gene loci

We further investigated the characteristics of shared circRNAs and unique circRNAs. We observed that with an increase in the occurrence of circRNAs, their abundance also increased in all the studied samples. In 13 HCC and 17 brain samples, the abundance of the highest occurring circRNA was 5 times more than that of the lowest occurring circRNA. We also analyzed the human granulocyte dataset and found that its circRNA expression profile was similar to that of the human brain samples (Fig. 7).

Figure 7

Comparison of the abundance of different circRNA occurrences. x-axis represents the occurrence of circRNAs in the recruited samples; y-axis shows the abundance of circRNAs (log2). (A) Thirteen human HCC samples. (B) Twelve mouse liver samples. (C) Seventeen human brain samples. (D) Twelve mouse brain samples. (E) Twenty-one human granulocyte samples.

Then, we discovered that a single gene locus can produce multiple circRNAs. We investigated whether the different occurrences of gene circRNAs corresponded to different numbers of circular isoforms. Based on the occurrence of the circRNA gene loci in the recruited samples, these circRNA gene loci were assigned to one of three categories: shared 20%, shared 20–90%, and shared 90%. Shared 20% are the gene loci of expressed circRNAs in <20% of the recruited samples. Shared 20–90% are the gene locus-expressed circRNAs in >20% and <90% of the recruited samples. Shared 90% are the gene locus-expressed circRNAs in >90% of recruited samples. Most of the shared 20% circRNA gene loci have one or two circRNA isoforms. Shared 90% circRNA gene loci have multiple circRNA isoforms. A total of 230 of these gene loci gave rise to >10 circRNAs in human granulocytes (Fig. 8). These gene loci are circRNA hotspot gene loci (38). The shared 90% circRNA loci have not only more distinct circRNA isoforms but also highly abundant circRNA isoforms compared to those in the other categories. We then analyzed the alternative splicing and alternative backsplicing of highly shared circRNA gene loci. Typically, the shared circRNA genes express more than two circRNAs in their gene locus; only one or two are highly abundant, whereas the others are lowly abundant and diverse. For example, two of the four circular isoforms of the circRNA UBXN7 were highly abundant. CircRNA UBXN7-1, derived from exon 3–5, was universally expressed in all the human granulocyte samples. CircRNA UBXN7-2 derived from exon 2–5 was detected in 14 of 21 granulocyte samples. Its other two circular isoforms were low abundance and were detected in less than five samples. The alternative backsplicing and alternative cis-splicing of circRNAs were diverse.

Figure 8

Number of distinct circRNAs per gene locus in humans and mice. x-axis represents the number of alternative circRNAs in a single gene; y-axis shows the fraction of genes. Based on the occurrence of the circRNA gene loci in the recruited samples, these circRNA gene loci were assigned to one of three categories: shared 20%, shared 20-90%, and shared 90%. Shared 20% are the gene locus-expressed circRNAs in <20% of the recruited samples. Shared 20-90% are the gene locus-expressed circRNAs in >20% and <90% of the recruited samples. Shared 90% are gene locus-expressed circRNAs in >90% of the recruited samples. (A) Fifty-five human HCC samples. (B) Twelve mouse liver samples. (C) Thirty human brain samples. (D) Twelve mouse brain samples. (E) Twenty-one human granulocyte samples.

Furthermore, we investigated the relative circular to linear transcript abundance. We suggested that the predominant circRNAs have a circular to linear ratio >1. We found 2,045 circRNA isoforms that are predominant transcripts among the 55 HCC samples, but only 8 predominant circRNAs, were shared in the HCC samples. Moreover, in the 30 brain samples, there were 5,017 predominant circRNAs and 405 shared circRNAs (Table II). Predominant circRNAs were not significantly enriched in shared circRNAs. In summary, shared circRNAs are highly abundant. The shared circRNA gene loci have multiple distinct circRNAs. In these shared circRNA gene loci, circRNA expression demonstrates diverse alternative cis-splicing and alternative backsplicing.

Comparison of the tissue-specific shared circRNAs in humans

Even though a large number of circRNAs are inter-individually diverse and vary among the samples, a small proportion of circular RNAs is shared. Previous studies indicated that the expression of circular RNA is related to the genomic origin of the linear transcripts (5,23,24) and that circular RNAs regulate the transcription of host mRNAs (9,50,51). We conducted a Gene Ontology analysis on the linear transcripts derived from the shared liver and brain circRNAs, which revealed significant differences between them. Since there are almost no shared circRNAs in the liver, we considered the circRNAs detected in >20% of the samples as shared circRNAs in the liver. We found that liver samples were enriched with lipoprotein metabolic process and extracellular exosome while brain samples were enriched with protein phosphorylation, postsynaptic density, and protein kinase activity (Fig. 9). Both liver and brain samples contained numerous protein binding genes, and most of the GO terms were related to tissue-specific functions. Highly represented gene categories included ApoE and ALB genes in the liver and RIMS1, HTT and KLHL24 in the brain (Fig. 9 and Table III).

Figure 9

GO analysis of the shared circRNA host genes. (A) Brain and (B) liver. Enriched terms are grouped by GO category: biological process (red), cellular component (green), and molecular function (blue).

Table III

CircRNAs shared in different tissues.

chr	start	end	gene	transcript	strand	circRNA
chr1	26729380	26774901	ARID1A	ARID1A-201	+	chr1:26729650-26732792(+)
chr1	1.17E+08	1.18E+08	MAN1A2	MAN1A2-001	+	chr1:117402185-117442325(+)
chr1	1.17E+08	1.18E+08	MAN1A2	MAN1A2-001	+	chr1:117402185-117420649(+)
chr1	1.17E+08	1.18E+08	MAN1A2	MAN1A2-001	+	chr1:117402185-117414831(+)
chr1	1.17E+08	1.18E+08	MAN1A2	MAN1A2-001	+	chr1:117402185-117405645(+)
chr1	66958911	66960078	MIER1	MIER1-005	+	chr1:66958058-66963160(+)
chr1	26921726	26946862	NUDC	NUDC-001	+	chr1:26942659-26943065(+)
chr1	1.81E+08	1.81E+08	STX6	STX6-002	−	chr1:180984676-180993425(−)
chr1	21715124	21721146	USP48	USP48-008	−	chr1:21715388-21721764(−)
chr10	5709626	5714574	FAM208B	FAM208B-014	+	chr10:5699524-5714207(+)
chr10	1.02E+08	1.02E+08	FBXW4	FBXW4-002	−	chr10:101667885-101676436(−)
chr12	1.23E+08	1.24E+08	RILPL1	RILPL1-001	−	chr12:123498543-123499536(−)
chr13	45962176	46052759	ZC3H13	ZC3H13-002	−	chr13:46003138-46020557(−)
chr14	45121588	45131261	FKBP3	FKBP3-003	−	chr14:45118027-45130790(−)
chr14	49820096	49852780	NEMF	NEMF-005	−	chr14:49825866-49831361(−)
chr14	39179090	39182750	PNN	PNN-005	+	chr14:39179090-39179462(+)
chr14	22909490	22911792	RBM23	RBM23-012	−	chr14:22909482-22911403(−)
chr14	22905585	22919149	RBM23	RBM23-004	−	chr14:22906194-22911403(−)
chr18	9136805	9235820	ANKRD12	ANKRD12-009	+	chr18:9182381-9221999(+)
chr18	21704957	21864974	MIB1	MIB1-004	+	chr18:21765771-21779685(+)
chr19	8461969	8465372 H	NRNPM H	NRNPM-014	+	chr19:8455404-8463686(+)
chr2	1.12E+08	1.12E+08	ZC3H6	ZC3H6-001	+	chr2:112299848-112300029(+)
chr2	1.12E+08	1.12E+08	ZC3H6	ZC3H6-001	+	chr2:112299848-112325197(+)
chr22	50372072	50444391	PPP6R2	PPP6R2-004	+	chr22:50372019-50394135(+)
chr3	1.96E+08	1.97E+08	RNF168	RNF168-001	−	chr3:196487398-196488683(−)
chr3	1.58E+08	1.59E+08	RSRC1	RSRC1-201	+	chr3:158122102-158123991(+)
chr3	1.7E+08	1.7E+08	SEC62	SEC62-009	+	chr3:169976945-169988359(+)
chr4	1.52E+08	1.53E+08	FBXW7	FBXW7-004	−	chr4:152411302-152412529(−)
chr4	1.28E+08	1.28E+08	LARP1B	LARP1B-005	+	chr4:128074459-128077962(+)
chr6	4836098	4954373	CDYL	CDYL-007	+	chr6:4891712-4892379(+)
chr6	18223868	18264823	DEK	DEK-001	−	chr6:18236451-18258405(−)
chr7	1E+08	1E+08	ZKSCAN1	ZKSCAN1-001	+	chr7:100023418-100024307(+)
chr8	61623710	61714596	ASPH	ASPH-001	−	chr8:61618977-61653660(−)
chr8	67083848	67195611	CSPP1	CSPP1-003	+	chr8:67131950-67137603(+)
chr8	1.08E+08	1.08E+08	EMC2	EMC2-004	+	chr8:108449822-108455930(+)
chr9	5919008	6007787	KIAA2026	KIAA2026-002	−	chr9:5968018-5988545(−)
chr9	33948374	33989043	UBAP2	UBAP2-004	−	chr9:33971650-33973237(−)
chr9	33994481	34048872	UBAP2	UBAP2-010	−	chr9:33986759-34017189(−)
chr9	33921693	34048901	UBAP2	UBAP2-001	−	chr9:33960825-33989126(−)

Furthermore, we also analyzed the liver, brain, blood, granulocyte and heart shared circRNAs, i.e., 39 shared circRNAs in these tissues (Table III). Some of these shared circRNAs have been previously validated by experiments, e.g., circZKSCAN1 (52) and circMAN1A2 (50,53). CDR1as was detected in the three other tissues except the blood and granulocyte samples.

Conservation of the identified circRNA candidates between human and mouse

Previous research indicated that circular RNAs are evolutionarily conserved in function (2,23,49,54). First, most circRNAs originated from CDS regions, which are evolutionally preserved in the genome. Second, the backsplice sites of circRNAs are conserved. We compared human and mouse conservation of circRNAs with a previously described computational method (55). In total, we identified 169,044 circRNA candidates in 178 human samples and 9,886 circRNA candidates in 24 mouse samples. In human and mouse, there were 3,579 shared conserved gene locus-detected circRNA candidates. There were 83,389 gene locus-expressed circRNA candidates in humans and 8,615 circRNA candidates in mice.

Discussion

Eukaryotic circRNAs are a type of less abundant, but biochemically stable, transcripts that are expressed in diverse genomic locations. The abundance of circRNAs is ~1–3% of the level of poly(A)+ RNAs (49), and most circRNAs exist in low abundance. Therefore, identifying all of the expressed circRNAs is difficult. Each circRNA prediction algorithm brings its own bias and 'blind spots' (22). We solved this problem by combining the distinct algorithms to yield a more trustworthy and sensitive output (34). It was reported that circRNA isoforms from the same host transcript share the same backsplice sites with different internal exons (24). As a result, the internal exon composition of circRNAs cannot simply be predicted using junction exons and linear RNA exon composition (24,48,56). RAISE is designed to detect circRNA internal exon composition and predict circRNA transcript sequences (Fig. 2C). The availability of coverage and splice information helps in the identification of a circRNA and its exon composition. The internal exon usage of circRNAs was used to predict circRNA transcript sequences and investigate circRNA functions. We used RAISE to compare the circRNA expression profiles in HCC with adjacent non-tumor tissue samples and did not find any significant differences. Even though there is no significant differential expression between tumor and adjacent non-tumor tissue, we observed that circRNAs expression profiles are diverse between individuals and are independent of the linear gene expression. In the case of circRNA expression, this variation means detected or not, whereas for linear gene expression, it represents whether the abundance is high or low. Since the circRNAs are not highly enriched in HCC-affected tissues, we tested whether various other tissues possessed the same expression patterns. In the brain rRNA depleted RNA-seq data, the ratio of shared circRNAs was 0.5%, which is higher than the ratio in the liver but lower than the ratio of the linear gene. The brain and liver samples displayed high inter-individual variation in the expression of circRNAs. We collected 21 granulocytes samples from seven individuals at three time-points to determine whether circRNAs were reproducibly expressed within one individual. The results showed that circRNAs are not reproducible within one individual. Briefly, the circRNAs fall into two categories: the randomly or variably expressed circRNAs, and the robustly expressed circRNAs. The biological roles of either of these types is not yet clear and thus requires functional studies. Several previous studies have highlighted a few circRNAs which are highly abundant and ubiquitously expressed (1,55). In agreement with previous research, individual circRNA expression seems to be highly stochastic. However, the variable expression of circRNAs in different samples may also trap miRNAs (57). Meanwhile, some online databases, including Arraystar's circRNA target prediction software, Circ2Traits, CircInteractome and CircNet (58–61), have been developed to predict circRNA-miRNA interaction networks. Circ2Traits is a comprehensive database for circular RNAs with potential association with disease and traits (58). CircNet and CircInteractome, both predict the miRNAs target of circRNAs and create the circRNA-miRNA interaction network (59,60). We also observed that most of the circRNAs are detected in only a few cell types and that they are not as cell-type-specific as mRNAs (55). The variation and diversity of circRNAs expression profiles may be due to the large number of circRNA transcripts expressed at a low level. Interestingly, most of the circRNAs co-exist with linear RNA transcripts; however, only a small portion of these circRNAs are predominant transcripts (Table II). There are several shared circRNAs among the recruited samples in the same tissue. The shared circRNAs usually have multiple circular isoforms in the gene locus, and these gene loci are circRNA hotspots. We conducted a Gene Ontology analysis on the host genes that gave rise to shared circRNAs and showed that these genes are related to tissue-specific functions. We used our RAISE pipeline to show that circRNAs have both intra-and inter-individual variations in their expression patterns. Our findings can be helpful in identifying novel circRNAs and designing better therapeutic approaches. Furthermore, according to our suggestions only the robustly expressed circRNAs are a candidate for usage as a biomarker.

61 in total

1. Scrambled exons.

Authors: J M Nigro; K R Cho; E R Fearon; S E Kern; J M Ruppert; J D Oliner; K W Kinzler; B Vogelstein
Journal: Cell Date: 1991-02-08 Impact factor: 41.582

2. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed.

Authors: Agnieszka Rybak-Wolf; Christin Stottmeister; Petar Glažar; Marvin Jens; Natalia Pino; Sebastian Giusti; Mor Hanan; Mikaela Behm; Osnat Bartok; Reut Ashwal-Fluss; Margareta Herzog; Luisa Schreyer; Panagiotis Papavasileiou; Andranik Ivanov; Marie Öhman; Damian Refojo; Sebastian Kadener; Nikolaus Rajewsky
Journal: Mol Cell Date: 2015-04-23 Impact factor: 17.970

3. Regulatory consequences of neuronal ELAV-like protein binding to coding and non-coding RNAs in human brain.

Authors: Claudia Scheckel; Elodie Drapeau; Maria A Frias; Christopher Y Park; John Fak; Ilana Zucker-Scharff; Yan Kou; Vahram Haroutunian; Avi Ma'ayan; Joseph D Buxbaum; Robert B Darnell
Journal: Elife Date: 2016-02-19 Impact factor: 8.140

4. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal: Nature Date: 2007-06-14 Impact factor: 49.962

5. Identification of HBV-MLL4 Integration and Its Molecular Basis in Chinese Hepatocellular Carcinoma.

Authors: Hua Dong; Lan Zhang; Ziliang Qian; Xuehua Zhu; Guanshan Zhu; Yunqin Chen; Xiaoying Xie; Qinghai Ye; Jie Zang; Zhenggang Ren; Qunsheng Ji
Journal: PLoS One Date: 2015-04-22 Impact factor: 3.240

6. CIRI: an efficient and unbiased algorithm for de novo circular RNA identification.

Authors: Yuan Gao; Jinfeng Wang; Fangqing Zhao
Journal: Genome Biol Date: 2015-01-13 Impact factor: 13.583

7. HTSeq--a Python framework to work with high-throughput sequencing data.

Authors: Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal: Bioinformatics Date: 2014-09-25 Impact factor: 6.937

8. Circular RNA profiling reveals an abundant circHIPK3 that regulates cell growth by sponging multiple miRNAs.

Authors: Qiupeng Zheng; Chunyang Bao; Weijie Guo; Shuyi Li; Jie Chen; Bing Chen; Yanting Luo; Dongbin Lyu; Yan Li; Guohai Shi; Linhui Liang; Jianren Gu; Xianghuo He; Shenglin Huang
Journal: Nat Commun Date: 2016-04-06 Impact factor: 14.919

9. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors: Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal: Bioinformatics Date: 2009-11-11 Impact factor: 6.937

10. Exon Skipping Is Correlated with Exon Circularization.

Authors: Steven Kelly; Chris Greenman; Peter R Cook; Argyris Papantonis
Journal: J Mol Biol Date: 2015-02-26 Impact factor: 5.469

9 in total

1. Novel circular RNA circNF1 acts as a molecular sponge, promoting gastric cancer by absorbing miR-16.

Authors: Zhe Wang; Ke Ma; Steffie Pitts; Yulan Cheng; Xi Liu; Xiquan Ke; Samuel Kovaka; Hassan Ashktorab; Duane T Smoot; Michael Schatz; Zhirong Wang; Stephen J Meltzer
Journal: Endocr Relat Cancer Date: 2019-03 Impact factor: 5.678

2. Reconstruction of Full-Length circRNA Sequences Using Chimeric Alignment Information.

Authors: Md Tofazzal Hossain; Jingjing Zhang; Md Selim Reza; Yin Peng; Shengzhong Feng; Yanjie Wei
Journal: Int J Mol Sci Date: 2022-06-17 Impact factor: 6.208

Review 3. Circulating Noncoding RNAs in Pituitary Neuroendocrine Tumors-Two Sides of the Same Coin.

Authors: Henriett Butz
Journal: Int J Mol Sci Date: 2022-05-04 Impact factor: 6.208

4. Exploring the cellular landscape of circular RNAs using full-length single-cell RNA sequencing.

Authors: Wanying Wu; Jinyang Zhang; Xiaofei Cao; Zhengyi Cai; Fangqing Zhao
Journal: Nat Commun Date: 2022-06-10 Impact factor: 17.694

5. Profiling and functional analysis of circular RNAs in acute promyelocytic leukemia and their dynamic regulation during all-trans retinoic acid treatment.

Authors: Shufen Li; Yunlin Ma; Yun Tan; Xuefei Ma; Ming Zhao; Bing Chen; Rongsheng Zhang; Zhu Chen; Kankan Wang
Journal: Cell Death Dis Date: 2018-05-29 Impact factor: 8.469