Literature DB >> 34711851

Extracellular circular RNA profiles in plasma and urine of healthy, male college athletes.

Elizabeth Hutchins1, Rebecca Reiman1, Joseph Winarta1, Taylor Beecroft1, Ryan Richholt1, Matt De Both1, Khalouk Shahbander1, Elizabeth Carlson1, Alex Janss1, Ashley Siniard1, Chris Balak1, Ryan Bruhns1, Timothy G Whitsett1, Roger McCoy2, Matthew Anastasi2, April Allen1, Brian Churas1, Matthew Huentelman1, Kendall Van Keuren-Jensen3.   

Abstract

Circular RNA (circRNA) are a recently discovered class of RNA characterized by a covalently-bonded back-splice junction. As circRNAs are inherently more stable than other RNA species, they may be detected extracellularly in peripheral biofluids and provide novel biomarkers. While circRNA have been identified previously in peripheral biofluids, there are few datasets for circRNA junctions from healthy controls. We collected 134 plasma and 114 urine samples from 54 healthy, male college athlete volunteers, and used RNASeq to determine circRNA content. The intersection of six bioinformatic tools identified 965 high-confidence, characteristic circRNA junctions in plasma and 72 in urine. Highly-expressed circRNA junctions were validated by qRT-PCR. Longitudinal samples were collected from a subset, demonstrating circRNA expression was stable over time. Lastly, the ratio of circular to linear transcripts was higher in plasma than urine. This study provides a valuable resource for characterization of circRNA in plasma and urine from healthy volunteers, one that can be developed and reassessed as researchers probe the circRNA contents of biofluids across physiological changes and disease states.
© 2021. The Author(s).

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 34711851      PMCID: PMC8553830          DOI: 10.1038/s41597-021-01056-w

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background and Summary

The advent of next-generation sequencing has spurred the discovery of a growing list of RNA biotypes, many of which are detectable across species, detected in numerous biofluids, and have biological function. While many studies have focused on microRNAs (miRNA), several other small RNA species (e.g. piwi-interacting RNAs (piRNA), tRNA fragments, and Y RNA fragments have been detected across a range of biofluids and are being developed as clinical biomarkers[1-4]. In addition to these linear RNAs, the discovery and detection of circular RNAs (circRNA), those with a covalently closed loop structure, have gained attention. CircRNAs were initially discovered by electron microscopy, in the 1970s, as viroid molecules[5]. Nearly two decades later, circRNA were identified for a handful of mammalian genes[6-8]. Though initially thought to be rare splicing events, circRNAs have recently been identified as an abundant, endogenous RNA species in a number of organisms from Archaea to yeast, plants, worms, flies, fish, and mammals[9-11]. Additionally, circRNAs are abundantly expressed in a number of human tissues and cell types, and circRNA expression changes during development, and as a response to extrinsic factors such as stress, immune response, and hormonal stimuli[12-17]. These endogenous RNAs are characterized by their circular structures, which are formed by a back-splicing event that covalently links the 3′ “tail” splice donor with the upstream 5′ splice acceptor “head” of the transcript, forming a back-spliced, or “head-to-tail” junction. While circRNA function is still being elucidated, there are examples of circRNA inhibiting microRNA, regulating alternative splicing, and modulating the expression of parental genes[18-22]. In comparison to their linear counterparts, circRNA transcripts can be more abundant and have greater stability as they are resistant to linear decay mechanisms and do not contain 5′-3′ polarity nor polyadenylated tails[14,21,23,24], suggesting feasibility as stable biomarkers. CircRNA stability and detection in biofluids, saliva[25], blood[24,26-30], and urine[31-33], comes, in part, from their being protected in extracellular vesicles[28,34-36]. Changes in circRNA expression is altered in multiple diseases, including preeclampsia, glioblastoma and colorectal cancer[30,37,38]. More recently, circRNAs in tumor tissues, as determined by next-generation sequencing, correlated with disease progression[39,40]. Urine circRNAs correlated with kidney rejection post-transplant[31], while differentially expressed circRNAs have been determined in plasma exosomes of lung cancer patients versus controls[41]. Most studies of circRNA have small sample sizes or are based on targeted microarray data, rather than discovery-based methods. This dataset includes more than 100 samples from 54 volunteers from two easily accessible biofluids (plasma and urine). In some cases, multiple samples were collected from the same participant longitudinally, allowing us to assess the reliability of circRNA detection in biofluids. The stability and abundance of circRNAs led us to investigate detection in two easily accessed biofluids: plasma and urine. As the volunteers were part of a larger study elucidating concussion biomarkers in male, college athletes, the samples are derived from young (18–25), healthy, male volunteers as depicted in Table 1. The longitudinal sample collections of plasma and urine are depicted in Online-only Table 1, including the number of circRNAs identified in each biofluid and those circRNAs observed concurrently in the biofluids. We identified circRNA in plasma (n = 134) and urine (n = 114), using RNAseq data followed by one of six different bioinformatic tools (Fig. 1). The intersection of the 6 bioinformatic tools provides a catalog for circRNA in plasma (Fig. 2a) and urine (Fig. 2c).
Table 1

Healthy Participant Characteristics.

Participant# plasma samples# urine samplesAgeRacial or Ethnic CategoryProtocoldbGaP Participant ID
0011119AARNAseq2048063
0021120AARNAseq2048064
0043322AARNAseq2048065
0055121AARNAseq2048066
0063322WRNAseq2048067
0073520AARNAseq2048068
0082021HRNAseq2048069
0103221AARNAseq2048070
0114620AARNAseq2048071
0132520HAWRNAseq2048072
0143422WRNAseq2048073
0151120HAWRNAseq2048074
0193123AARNAseq2048075
02226N/AWRNAseq2048076
0230119AARNAseq2048077
0247621AARNAseq2048078
0252220AARNAseq2048079
0292221WRNAseq2048080
0303222AARNAseq2048081
0315223WRNAseq2048082
0360121WRNAseq2048083
0394322WRNAseq2048084
0421023AARNAseq2048085
0441222WRNAseq2048086
0455520WRNAseq2048087
0467422AARNAseq2048088
0482121AA and HRNAseq2048089
0495322AARNAseq2048090
1701118AsianRNAseq2048091
20135N/AAARNAseq2048092
2020322AARNAseq2048093
2039422AARNAseq2048094
20410N/AAARNAseq2048095
20511N/AN/ARNAseq2048095
2065020AARNAseq2048097
2072120HAWRNAseq2048098
2081019AARNAseq2048099
2091120W and HRNAseq2048100
2101121AARNAseq2048101
21110N/AN/ARNAseq2048102
2121020WRNAseq2048103
2131322AARNAseq2048104
2140220WRNAseq2048105
2150120HRNAseq2048106
2161319AARNAseq2048107
2181120AARNAseq2048109
22010N/AN/ARNAseq2048111
2218319AARNAseq2048112
22210N/AN/ARNAseq2048113
2231318AARNAseq2048114
2241219HAWRNAseq2048115
2264121AARNAseq2048117
2277221WRNAseq2048118
2281319AARNAseq2048119

AA = African American, Asian = Asian or Asian American,

H = Hispanic or Latino, HAW = Native Hawaiian, W = White, N/A = not available.

Online-only Table 1

Longituduinal sample collection with circRNAs detected across all informatic tools.

SubjectCollection dateBiofluid# of circRNA detected# of circRNA detected in both
1Time point #1plasma1197
1Time point #1urine39
2Time point #1plasma3486
2Time point #1urine33
4Time point #1plasma2206
4Time point #1urine20
486 days from Time point #1plasma32910
486 days from Time point #1urine40
498 days from Time point #1plasma1045
498 days from Time point #1urine30
5Time point #1plasma11
5Time point #1urine23
565 days from Time point #1plasma44NA
572 days from Time point #1plasma35NA
598 days from Time point #1plasma67NA
5121 days from Time point #1plasma272NA
6Time point #1plasma906
6Time point #1urine20
651 days from Time point #1urine17NA
658 days from Time point #1plasma60NA
686 days from Time point #1plasma2946
686 days from Time point #1urine30
7Time point #1plasma1584
7Time point #1urine10
742 days from Time point #1urine23NA
758 days from Time point #1plasma1684
758 days from Time point #1urine20
779 days from Time point #1urine12NA
7*NAplasma793
7*NAurine20
8Time point #1plasma545NA
8117 days from Time point #1plasma406NA
10Time point #1urine24NA
107 days from Time point #1plasma4409
107 days from Time point #1urine45
1014 days from Time point #1plasma695NA
1021 days from Time point #1plasma502NA
11Time point #1plasma384
11Time point #1urine44
11361 days from Time point #1plasma701NA
11399 days from Time point #1plasma419NA
11408 days from Time point #1urine8NA
11415 days from Time point #1urine29NA
11436 days from Time point #1urine31NA
11450 days from Time point #1urine38NA
11464 days from Time point #1urine25NA
11485 days from Time point #1plasma708NA
13Time point #1plasma282
13Time point #1urine14
13408 days from Time point #1urine53NA
13415 days from Time point #1urine58NA
13427 days from Time point #1plasma829NA
13450 days from Time point #1urine14NA
13457 days from Time point #1urine27NA
14Time point #1plasma2084
14Time point #1urine7
1451 days from Time point #1urine23NA
14361 days from Time point #1plasma5777
14361 days from Time point #1urine41
14436 days from Time point #1urine51NA
14485 days from Time point #1plasma836NA
15Time point #1plasma5438
15Time point #1urine52
19Time point #1plasma199NA
1938 days from Time point #1plasma474NA
1954 days from Time point #1plasma556NA
1975 days from Time point #1urine21NA
22Time point #1urine12NA
22128 days from Time point #1plasma573
22128 days from Time point #1urine25
22361 days from Time point #1plasma3482
22361 days from Time point #1urine21
22450 days from Time point #1urine58NA
22457 days from Time point #1urine52NA
22464 days from Time point #1urine33NA
23Time point #1urine19NA
24Time point #1plasma836NA
249 days from Time point #1plasma4245
249 days from Time point #1urine30
2416 days from Time point #1urine1NA
2428 days from Time point #1plasma818NA
2451 days from Time point #1plasma4063
2451 days from Time point #1urine11
2458 days from Time point #1plasma6456
2458 days from Time point #1urine21
2465 days from Time point #1plasma5089
2465 days from Time point #1urine34
2497 days from Time point #1plasma508NA
24Time point #1urine23NA
25Time point #1plasma538NA
2554 days from Time point #1plasma711
2554 days from Time point #1urine5
2589 days from Time point #1urine20NA
29Time point #1plasma464NA
2947 days from Time point #1urine21NA
2954 days from Time point #1plasma565NA
29103 days from Time point #1urine11NA
30Time point #1plasma43011
30Time point #1urine46
307 days from Time point #1plasma777NA
3028 days from Time point #1urine30NA
3049 days from Time point #1plasma689NA
31Time point #1plasma1587
31Time point #1urine32
31114 days from Time point #1plasma413NA
31121 days from Time point #1plasma502NA
31361 days from Time point #1plasma448NA
31399 days from Time point #1plasma658NA
31464 days from Time point #1urine53NA
36Time point #1urine21NA
39Time point #1plasma4108
39Time point #1urine38
3951 days from Time point #1plasma1724
3951 days from Time point #1urine28
39361 days from Time point #1plasma609NA
39408 days from Time point #1urine26NA
39471 days from Time point #1plasma611NA
42Time point #1plasma31NA
44Time point #1urine21NA
44121 days from Time point #1plasma3276
44121 days from Time point #1urine17
45Time point #1plasma41310
45Time point #1urine30
45107 days from Time point #1plasma3438
45107 days from Time point #1urine19
45361 days from Time point #1plasma82411
45361 days from Time point #1urine51
45415 days from Time point #1urine13NA
45427 days from Time point #1plasma423NA
45436 days from Time point #1urine54NA
45496 days from Time point #1plasma513NA
46Time point #1plasma421
46Time point #1urine31
4614 days from Time point #1plasma28NA
4614 days from Time point #1urine12NA
4621 days from Time point #1plasma1174
4621 days from Time point #1urine14
4628 days from Time point #1plasma2717
4628 days from Time point #1urine41
4656 days from Time point #1plasma278NA
4663 days from Time point #1plasma116NA
4670 days from Time point #1plasma122NA
48Time point #1plasma27NA
4849 days from Time point #1plasma1354
4849 days from Time point #1urine13
49Time point #1plasma41NA
4947 days from Time point #1plasma4685
4947 days from Time point #1urine11
4996 days from Time point #1plasma5604
4996 days from Time point #1urine7
49103 days from Time point #1urine2NA
49117 days from Time point #1plasma408NA
49124 days from Time point #1plasma478NA
170Time point #1plasma1835
170Time point #1urine22
201Time point #1plasma68810
201Time point #1urine42
20147 days from Time point #1urine57NA
20154 days from Time point #1urine53NA
20189 days from Time point #1urine24NA
201103 days from Time point #1plasma5989
201103 days from Time point #1urine28
201117 days from Time point #1plasma489NA
202Time point #1urine17NA
20247 days from Time point #1urine11NA
20254 days from Time point #1urine4NA
203Time point #1plasma70NA
20338 days from Time point #1plasma585NA
20347 days from Time point #1plasma6349
20347 days from Time point #1urine23
20354 days from Time point #1plasma3614
20354 days from Time point #1urine15
20366 days from Time point #1plasma511NA
20389 days from Time point #1plasma81811
20389 days from Time point #1urine13
20396 days from Time point #1plasma4706
20396 days from Time point #1urine18
203103 days from Time point #1plasma488NA
203110 days from Time point #1plasma812NA
204Time point #1plasma83NA
205Time point #1plasma3574
205Time point #1urine46
206Time point #1plasma375NA
20689 days from Time point #1plasma651NA
206103 days from Time point #1plasma654NA
206110 days from Time point #1plasma784NA
206117 days from Time point #1plasma717NA
207Time point #1plasma786NA
20747 days from Time point #1urine13NA
20754 days from Time point #1plasma814NA
208Time point #1plasma647NA
209Time point #1plasma586NA
209103 days from Time point #1urine52NA
210Time point #1plasma667NA
21047 days from Time point #1urine26NA
211Time point #1plasma398NA
212Time point #1plasma463NA
213Time point #1urine60NA
2137 days from Time point #1urine37NA
21319 days from Time point #1plasma431NA
21328 days from Time point #1urine50NA
214Time point #1urine9NA
214103 days from Time point #1urine31NA
215Time point #1urine15NA
216Time point #1plasma5098
216Time point #1urine35
21689 days from Time point #1urine28NA
21696 days from Time point #1urine30NA
218Time point #1urine18NA
218117 days from Time point #1plasma393NA
220Time point #1plasma707NA
221Time point #1plasma377NA
2219 days from Time point #1plasma5116
2219 days from Time point #1urine27
22116 days from Time point #1urine63NA
22128 days from Time point #1plasma365NA
22151 days from Time point #1plasma838NA
22158 days from Time point #1plasma773NA
22165 days from Time point #1urine13NA
22172 days from Time point #1plasma587NA
22179 days from Time point #1plasma760NA
22186 days from Time point #1plasma745NA
222Time point #1plasma292NA
223Time point #1urine7NA
22389 days from Time point #1urine41NA
223103 days from Time point #1plasma5884
223103 days from Time point #1urine28
224Time point #1plasma774NA
22466 days from Time point #1urine51NA
224103 days from Time point #1urine22NA
226Time point #1plasma413NA
22638 days from Time point #1plasma791NA
22654 days from Time point #1urine8NA
226103 days from Time point #1plasma435NA
226124 days from Time point #1plasma568NA
227Time point #1plasma813NA
22738 days from Time point #1plasma673NA
22754 days from Time point #1plasma4919
22754 days from Time point #1urine21
22766 days from Time point #1plasma666NA
22789 days from Time point #1plasma445NA
22796 days from Time point #1urine54NA
227103 days from Time point #1plasma619NA
227135 days from Time point #1plasma570NA
228Time point #1plasma77411
228Time point #1urine44
22835 days from Time point #1urine52NA
22842 days from Time point #1urine54NA

*denotes samples collected on an unspecified date

Fig. 1

Study Workflow.

Fig. 2

CircRNAs were predicted from 134 plasma (a,b) and 114 urine (c,d) samples using 6 different bioinformatic tools. 965 circRNA were identified by all 6 tools in plasma (a; red bar), and 72 circRNA were identified by all 6 tools in urine (c; red bar). Genomic features located within predicted back-spliced junctions in plasma (b) and urine (d), respectively.

Healthy Participant Characteristics. AA = African American, Asian = Asian or Asian American, H = Hispanic or Latino, HAW = Native Hawaiian, W = White, N/A = not available. Study Workflow. CircRNAs were predicted from 134 plasma (a,b) and 114 urine (c,d) samples using 6 different bioinformatic tools. 965 circRNA were identified by all 6 tools in plasma (a; red bar), and 72 circRNA were identified by all 6 tools in urine (c; red bar). Genomic features located within predicted back-spliced junctions in plasma (b) and urine (d), respectively. As there are few datasets with circular RNAs cataloged in clinically-relevant biofluids, we expect this data to contribute to the characterization of circRNAs in young, healthy males. While this might be a direct comparator for concussions, or other diseases more prevalent in young men, we also expect this dataset to help begin to fill out a broader assessment of circRNAs present in healthy populations.

Methods

Sample collection and participants

Samples were collected from healthy, male volunteers, ages 18–25, with consent and approval from the Western Institutional Review Board (WIRB) study ID #1307009395. All participants provided written consent prior to enrollment. We obtained plasma (n = 134) and urine (n = 114) samples from 54 healthy male volunteers. In 71.4% of participants, both biofluid types were collected from the same individual. Blood samples were collected in EDTA tubes, and urine was collected in sterile cups. After collection, samples were placed in a cooler with ice packs and transported from Arizona State University to the Translational Genomics Research Institute, within 2–3 hours of collection. Blood samples were spun down at 1320 x G for 10 minutes at 4 °C, and 1 mL aliquots of plasma were collected in RNase/DNase free microcentrifuge tubes (VWR) and stored at −80 C. Urine samples were spun at 1900 x G for 10 minutes at 4 °C and 15 mL aliquots were collected in 50 mL conical tubes for storage at 80 °C.

RNA isolation, library preparation, and sequencing

For plasma samples, total RNA was isolated from 1 mL plasma using the mirVana PARIS RNA and Native Protein Purification Kit (Thermo Fisher, Cat. No.: AM1556) as in Burgos et al.[42], treated with the DNA-free DNA Removal Kit (Thermo Fisher, Cat. No.: AM1906), and purified and concentrated with RNA Clean & Concentrator – 5 columns (Zymo Research, Cat. No.: R1016) by following Appendix C in the kit’s protocol. For urine samples, total RNA was isolated from 15 mL urine using Norgen’s Urine Total RNA Purification Maxi Kit (Slurry Format) (Norgen, Cat. No.: 29600), treated with the RNase-Free DNase Set (Qiagen, Cat. No.: 79254), and concentrated with the speed vacuum. The isolated RNA was quantitated with Quant-iT Ribogreen RNA Assay (Thermo Fisher, Cat. No.: R11490). Samples were not ribo-depleted, double-stranded cDNA was synthesized from 10 ng total RNA with the SMARTer Universal Low Input RNA Kit for Sequencing (Clontech, Cat. No.: 634940) using thirteen PCR cycles. The double-stranded cDNA was quantitated with the Qubit dsDNA HS Assay Kit (Thermo Fisher, Cat. No.: Q32854). For each healthy control sample, Illumina-compatible libraries were synthesized from 2 ng double-stranded cDNA with Clontech’s Low Input Library Prep Kit (Clontech, Cat. No.: 634947) using four mandatory PCR cycles plus ten additional cycles. Each library was measured for size via Agilent’s High Sensitivity D1000 Screen Tape and reagents (Agilent, Cat. No.: 5067–5602 & 5067–5585) and measured for concentration via the KAPA SYBR FAST Universal qPCR Kit (Kapa Biosystems, Cat. No.: KK4824). Libraries were then combined into equimolar pools, and each pool was measured for size and concentration. Pools were clustered onto a paired-end flowcell (Illumina, Cat. No.: PE-401–3001) with a 20% v/v PhiX v3 spike-in (Illumina, Cat. No.: FC-110-3001) and sequenced on Illumina’s HiSeq. 2500 with TruSeq v3 chemistry (Illumina, Cat. No.: FC-401-3002). The first and second reads were each 83 bases.

CircRNA prediction

Samples were demultiplexed and raw fastqs generated using CASAVA (v1.8.2, Illumina). Raw fastqs were trimmed using cutadapt (v1.9) with a quality score cutoff of 30 and a minimum length of 30 bp[43]. For each sample, 6 different algorithms (Table 2) were used to predict circRNA: KNIFE v1.4[44], find_circ[21], MapSplice2[45], CIRCexplorer[46], CIRI2[47], and DCC[48]. Indices of the GRCh37/hg19 genome were created using bwa and STAR v2.4.0j using default parameters[49,50]; bowtie and bowtie2 genome indices were downloaded with the KNIFE package[51,52]. Reads were mapped to the genome with the recommended aligner and alignment parameters for each program: STAR v2.4.0j for DCC and CIRCexplorer, bowtie2 v2.2.1 for find_circ and KNIFE, bowtie v0.12.9 for MapSplice2, and bwa v0.7.13 for CIRI2. CircRNA prediction was then completed with the suggested parameters for each program, with the exception of incorporating a minimum 18nt overlap on either side of the junction. CircRNAs were kept for downstream analysis if they 1) had 2 or more junction counts and 2) were identified in at least 5 samples for each respective program.
Table 2

CircRNA program characteristics.

ProgramAlignerVersionPaired-End Read AwareAnnotation AwareDefault Junction OverlapAdjusted Junction OverlapReference
KNIFEbowtie21.4YesYes13 nt18 nt[44]
find_circbowtie21.0NoNo18 nt[21]
MapSplicebowtie2.1.8YesYes10 nt18 nt[45]
CIRCexplorerSTAR1.1.7No*Yes15 nt**18 nt[46]
CIRIbwa2.0.1YesYes19 nt***[47]
DCCSTAR0.3.2YesYes15 nt**18 nt[48]

*The latest version of CIRCexplorer now supports paired-end reads.

**CIRCexplorer and DCC use the STAR chimeric junctions output, so the junction overlap for these tools is set by the splice junction parameters during STAR alignment.

***The default minimum seed length (k) for bwa mem is 19 nucleotides.

CircRNA program characteristics. *The latest version of CIRCexplorer now supports paired-end reads. **CIRCexplorer and DCC use the STAR chimeric junctions output, so the junction overlap for these tools is set by the splice junction parameters during STAR alignment. ***The default minimum seed length (k) for bwa mem is 19 nucleotides.

Analysis of predicted circRNA

The version of CIRCexplorer used here does not support paired-end data; therefore, circRNA prediction was performed on each pair separately and then combined for analysis. For each program, BED files containing count expression data were created from the output data. CIRCexplorer, KNIFE, and find_circ output files all produce output files with 0-based coordinates while CIRI2, MapSplice, and DCC output files have 1-based coordinates; therefore, all coordinates were converted to a 0-based system for comparison. BED12 GRCh37 RefSeq gene annotation files were obtained from UCSC (http://genome.ucsc.edu/cgi-bin/hgTables), and bedtools v2.26.0 was used to infer genes from reported backsplice junction genome locations[53]. Data were analyzed using the R v3.3.2 statistical package (https://cran.r-project.org). UpSet plots were generated using the UpSetR v1.3.3 package[54].

Quantification of circRNA expression

CircRNA count expression data was obtained from each respective bioinformatic program. Junction reads per million (JRPM) were calculated according to the total number of junction reads found in each sample as identified by STAR (both canonical and chimeric); therefore, JRPM = (circRNA count/junction reads) * 1,000,000. The circular-to-linear ratio (CLR) for each circRNA was calculated as described previously[13,27], by counting the linear spliced reads identified by STAR on the 5′ and 3′ flanks of each circRNA junction, and dividing the back-spliced read count by the flank with the highest count; therefore, CLR = circRNA count/max (5′ linear junction count, 3′ linear junction count). In order to avoid division by zero, if no linearly spliced reads were detected, a pseudo count of 1 was added to the denominator. The number of reads assigned to the transcriptome was calculated using featureCounts (subread v1.5.1) with the Ensembl75 gene annotation[55]. Differential expression analysis was performed using DESeq. 2 v1.14.1[56], after filtering to select samples which had detected at least 300 circRNA/sample as well as exclusion of circRNA that were expressed in less than 50% of samples.

DNA isolation and qRT-PCR

After centrifugation of blood samples, DNA was isolated from the buffy coat using the DNeasy Kit (Qiagen, Cat. No.: 69504). Previously isolated RNA from samples matching those used for library prep were selected for cDNA synthesis. cDNA was synthesized with random hexamers using the SuperScript III First-Strand Synthesis System for RT-PCR following manufacturer’s protocols (Invitrogen, Cat. No.: 18080-051) with three nanograms of total RNA as input, and stored at −20 °C. Inward-facing (crossing the back-splice junction) custom primers were designed with Primer3 and LabReady primers (100 µM in IDTE pH 8.0) were ordered from Integrative DNA Technologies with Standard Desalting Purification[57,58]. Real-time qRT-PCR was performed with SYBR Select Master Mix (Thermo Fisher, Cat. No.: 4472919) on the QuantStudio 7 (Applied Biosystems), with 0.2 µM of primer and 0.2 µL of cDNA template or 2 ng of gDNA template per 10 µL reaction. U6 was used as a positive control and no template controls (NTCs) were used as a negative control. All results are expressed as the mean of three independent reactions, with a standard deviation less than 0.5. The ReadqPCR v1.20.0 and NormqPCR v1.20.0 Bioconductor v3.4 packages were used for qRT-PCR data analysis[59].

Data Records

Raw FASTQ files for the RNAseq libraries were deposited into dbGap (accession # phs001258.v2.p1) (https://identifiers.org/dbgap:phs001258.v2.p1)[60]. Data (circRNAs identified across all informatic tools and raw cirRNA expression) are also provided in figshare: 10.6084/m9.figshare.c.5420832[61].

Technical Validation

CircRNA set size and genomic alignment

The set size (all circRNA in any sample by one tool) ranges from 1,835 to 7,462 and 163 to 1,349 in plasma and urine, respectively (Table 3). 965 and 72 circRNA were detected across all six tools in plasma and urine, respectively (Fig. 2a,c, red bars; Table 4; full list in figshare File 1 and 2[61]). KNIFE predicted the most circRNA per sample in plasma and urine, while MapSplice predicted the fewest (Table 3). Table 5 displays the correlations between all of the tools, CIRCexplorer and DCC had the highest correlation. 85% (61 of the 72) of the circRNAs found in urine were also detected in plasma (Table 4). Figure 2b(plasma) and 2d (urine) display the number of detected circRNAs and the number that span introns, exons, and UTRs for both plasma and urine. The majority of circRNA identified in plasma and urine contain at least two exons and span an intron; 671 in plasma and 52 in urine; green bars (Fig. 2b, plasma and 2d, urine). A small number of circRNA are transcribed from a single exon (15 in plasma and 2 in urine).
Table 3

CircRNA totals detected across six informatic tools in plasma and urine.

PlasmaUrine
total circRNAmean circRNA/sampletotal circRNAmean circRNA/sample
CIRCexplorer6,2979091,142119
CIRI26,7891,0751,205131
DCC7,1591,0091,287132
find_circ2,91639643844
KNIFE7,4621,0861,349139
MapSplice1,83527916317
Table 4

Number of circRNA detected in plasma and urine by all 6 bioinformatic tools.

Plasma (n = 134)Urine (n = 114)Both Plasma and Urine
Detected in at least 1 sample9657261
Detected in 10% of samples9647160
Detected in 20% of samples8816151
Detected in 30% of samples6754134
Detected in 40% of samples5382824
Detected in 50% of samples3951616
Detected in 60% of samples2731411
Detected in 70% of samples1771010
Detected in 80% of samples6842
Detected in 90% of samples1521
Detected in 100% of samples000
Table 5

Pearson’s correlation of circRNA expression (JRPM) between informatic tools.

Plasma
CIRCexplorerCIRIDCCfind_circKNIFEMapSplice
CIRCexplorer10.8780.9450.8380.8450.798
CIRI0.87810.8820.8360.8410.908
DCC0.9450.88210.8430.8870.79
find_circ0.8380.8360.84310.820.776
KNIFE0.8450.8410.8870.8210.773
MapSplice0.7980.9080.790.7760.7731
Urine
CIRCexplorerCIRIDCCfind_circKNIFEMapSplice
CIRCexplorer10.8240.9160.740.8240.767
CIRI0.82410.8690.7380.8430.817
DCC0.9160.86910.7930.8890.733
find_circ0.740.7380.79310.8010.718
KNIFE0.8240.8430.8890.80110.709
MapSplice0.7670.8170.7330.7180.7091
CircRNA totals detected across six informatic tools in plasma and urine. Number of circRNA detected in plasma and urine by all 6 bioinformatic tools. Pearson’s correlation of circRNA expression (JRPM) between informatic tools.

Highly expressed, back-spliced junctions were validated by qRT-PCR

In order to validate predicted back-spliced junctions by qRT-PCR, we designed inward-facing primers for the 15 most highly expressed circRNA in each biofluid and tested each primer pair in samples from 10 different individuals, using the same source RNA for cDNA synthesis that was used for RNAseq (Fig. 3a,b). Figure 3a shows that the 15 circRNAs are detected in most of the 10 plasma samples. The numbers of samples are described in Table 6, and compared with the RNASeq detection for those circRNAs in the same samples. 13 primer pairs were validated in urine. Detection in urine samples was sparse, with fewer samples positive for each circRNA than for plasma (Fig. 3b and Table 6). For the two back-spliced junctions detected in RNASeq data, but not validated by qRT-PCR in urine (circMYO5B and circPHC3), it is possible that the circRNA primers did not work, or there were qPCR inhibitors in the sample, or the circRNA was not present. Two of the samples did not have enough assigned reads via RNASeq to be included, so the total number of samples was 8. In order to rule out chimeric junctions that might be present in DNA or resemble artifacts introduced during library preparation, we also used genomic DNA (gDNA) from each individual as a negative control. All 15 primer pairs used in the plasma and urine samples were not detected in gDNA (data not shown). Table 7 describes the rank from highest to lowest expression for each of the circRNA validated by qRT-PCR, and compares it with the expression detected with sequencing. Their ranks do not correlate well between the two platforms.
Fig. 3

(a,b) Highly-expressed, predicted back-spliced junctions were validated by qRT-PCR. qRT-PCR validation of the 15 most highly expressed circRNA found in plasma (a) and urine (b), respectively. Each circRNA was examined in 10 cDNA samples from the same source RNA as sequenced samples. (c,d) Circular-linear ratios are higher in plasma than urine. Linear splice junction expression plotted against circular splice junction expression in plasma (c) and urine (d). Points representing circRNA between 1-fold and 5-fold higher than their linear counterparts are blue; 5x or higher are red.

Table 6

circRNA detection in 10 samples by qRT-PCR and RNASeq.

qRT-PCRcircRNA DetectionRNASeqcircRNA Detection
plasmaX out of 10 samples testedplasmaX out of 10 samples tested
circARHGEF1210circARHGEF129
circFIP1L1-110circFIP1L1-19
circMCU10circMCU9
circRHBDD110circRHBDD19
circSIAE9circSIAE9
circCDK178circCDK179
circFIP1L1-210circFIP1L1-29
circNRIP110circNRIP19
circPOMT110circPOMT19
circSMARCA510circSMARCA59
circETFA10circETFA7
circPCMTD110circPCMTD16
circPRKCB10circPRKCB9
circUXS110circUXS19
circYPEL210circYPEL29
qRT-PCRcircRNA DetectionqRT-PCRcircRNA Detection
urineX out of 10 samples testedurineX out of 10 samples tested
circPHC30circPHC38
circPOMT14circPOMT18
circRHBDD12circRHBDD16
circSMARCA57circSMARCA57
circYPEL23circYPEL27
circCDYL23circCDYL24
circFARSA3circFARSA7
circPAPOLA3circPAPOLA7
circRBM235circRBM234
circUBAP26circUBAP26
circARHGEF126circARHGEF127
circDMXL11circDMXL15
circFIP1L12circFIP1L17
circMYO5B0circMYO5B7
circSTK393circSTK397
Table 7

qRT-PCR and RNA-Seq expression of the 15 most highly expressed genes in plasma and urine.

Plasma
circRNAmean CtqRT-PCR Rankmean JRPMRNA-Seq Rank
circUXS127.4151.2115
circNRIP127.96289.810
circARHGEF1228.043110.135
circMCU28.534415.781
circPCMTD128.8560.0913
circFIP1L1-128.996250.672
circRHBDD129.757217.814
circETFA30.05866.3511
circPRKCB30.16965.8112
circSMARCA530.641099.599
circSIAE31.4111234.593
circYPEL231.881251.4414
circCDK1732.1213102.047
circFIP1L1-232.3614100.468
circPOMT132.615106.796
Urine
circRNAmean CtqRT-PCR Rankmean JRPMRNA-Seq Rank
circARHGEF1231.2715.2715
circRBM2331.4628.215
circUBAP231.6836.4910
circCDYL231.9346.728
circPAPOLA32.6456.619
circSMARCA532.7267.856
circRHBDD133.22710.723
circYPEL233.2589.34
circDMXL133.6396.2911
circPOMT133.671028.462
circFARSA33.75117.57
circFIP1L1.233.8125.6114
circSTK3934.45136.2412
circMYO5BN/A145.713
circPHC3N/A1581.191
(a,b) Highly-expressed, predicted back-spliced junctions were validated by qRT-PCR. qRT-PCR validation of the 15 most highly expressed circRNA found in plasma (a) and urine (b), respectively. Each circRNA was examined in 10 cDNA samples from the same source RNA as sequenced samples. (c,d) Circular-linear ratios are higher in plasma than urine. Linear splice junction expression plotted against circular splice junction expression in plasma (c) and urine (d). Points representing circRNA between 1-fold and 5-fold higher than their linear counterparts are blue; 5x or higher are red. circRNA detection in 10 samples by qRT-PCR and RNASeq. qRT-PCR and RNA-Seq expression of the 15 most highly expressed genes in plasma and urine.

Circular-to-linear RNA ratios

While the overall expression of most circRNAs is low compared to their linear counterparts, there are a number of circular RNA transcripts that have been described as more abundant than their linear host, cellularly as well as extracellularly[23,27,62,63]. We examined the circular-to-linear ratio (CLR) of circRNA transcripts found in plasma and urine as described previously; by taking the ratio of the circular, back-spliced junction counts compared to the linear count of the nearest 5′ or 3′ splice junction[13,24,27]. On average, 28.5% of circRNA transcripts in plasma and 21.5% of circRNA transcripts in urine have higher expression than their linear host gene (Fig. 3c, plasma and 3d, urine). Extracellular RNA is often fragmented and may have a 3′ bias[64]. Before examining the expression of circular RNA in relation to their host genes, we calculated the overall 5′ to 3′ coverage of linear transcripts and did not find a bias in our samples.

Participants sequenced 5 or more times have less inter-sample variation

A notable feature of this dataset is that many participants were sampled longitudinally, allowing for analysis of circRNA stability in individuals versus the entire dataset. Figure 4a,b show longitudinal circRNA expression in the same participants in plasma and urine, respectively. Broadly speaking, the heatmaps demonstrate similar expression patterners in the same participant over time. In order to assess variability within individuals, we calculated the coefficient of variation (CV) of circRNA expression, normalized to junction reads per million (JRPM). Here, we focus on participants sampled on 5 or more occasions over approximately one year. In both plasma and urine, the CV for each individual participant is displayed along with the CV for all participant samples. The data indicate that individuals have a statistically-significant consistency in circRNA expression pattern over time (Fig. 4c, plasma and 4d, urine).
Fig. 4

Participants sequenced five or more times have less inter-sample variation. CircRNA populations identified in plasma (a,c) and urine (b,d) from participants sampled five or more times. (a,b) Heatmaps showing the log-normalized JRPM expression of plasma (a) and urine (b) samples taken longitudinally from the same participant. The coefficient of variation (CV) of circRNA expression is significantly lower across individual participant samples when compared to the entire dataset (c, plasma; and d, urine). ****p <  = 0.0001.

Participants sequenced five or more times have less inter-sample variation. CircRNA populations identified in plasma (a,c) and urine (b,d) from participants sampled five or more times. (a,b) Heatmaps showing the log-normalized JRPM expression of plasma (a) and urine (b) samples taken longitudinally from the same participant. The coefficient of variation (CV) of circRNA expression is significantly lower across individual participant samples when compared to the entire dataset (c, plasma; and d, urine). ****p <  = 0.0001.

Usage Notes

As the approach to detecting circRNA from RNA-Seq data differ with available tools, we employed 6 different bioinformatic tools: CIRCexplorer, CIRI2, DCC, KNIFE, find_circ, and MapSplice, in two clinically relevant biofluids, plasma and urine, using 134 and 114 samples, respectively. Most of these circRNA pipelines use an external aligner, such as bowtie, STAR, or bwa, to align reads to the genome and/or transcriptome (Table 2). After alignment, reads that contiguously align to the genome and/or transcriptome are filtered out, and the remaining unmapped reads are further filtered to identify back-spliced junctions. Differences in circRNA identification algorithms include: 1) how paired-end reads and gene annotations are used, if at all, 2) the amount of overlap over the junction that a read must contain, 3) the types of junctions considered, and 4) various filtering steps (Table 1)[65]. We sought to generate a high confidence set of circRNA expressed in plasma and urine with the following requirements for each circRNA: 1) detection in at least 5 samples for each respective biofluid, 2) a minimum 18 nt overlap on either side of the junction, 3) at least two reads spanning the back-spliced junction, and 4) identification by all 6 tested bioinformatic tools as identification can vary widely between tools[66-68]. We tested alignment parameters and their influence on the detection rate of circRNA and found that the number of input reads, genome mapped reads, and junction reads did not correlate well with the number of circRNA detected per sample; rather the number of reads assigned to the transcriptome had the greatest correlation with the number of circRNA (R2 = 0.805; data not shown).

Supplementary information

Supplemental Data File
Measurement(s)transcriptome • RNA(circular)
Technology Type(s)RNA sequencing
Factor Type(s)biofluid
Sample Characteristic - OrganismHomo sapiens
  65 in total

1.  Enhancements and modifications of primer design program Primer3.

Authors:  Triinu Koressaar; Maido Remm
Journal:  Bioinformatics       Date:  2007-03-22       Impact factor: 6.937

2.  circRNA biogenesis competes with pre-mRNA splicing.

Authors:  Reut Ashwal-Fluss; Markus Meyer; Nagarjuna Reddy Pamudurti; Andranik Ivanov; Osnat Bartok; Mor Hanan; Naveh Evantal; Sebastian Memczak; Nikolaus Rajewsky; Sebastian Kadener
Journal:  Mol Cell       Date:  2014-09-18       Impact factor: 17.970

Review 3.  A comprehensive overview and evaluation of circular RNA detection tools.

Authors:  Xiangxiang Zeng; Wei Lin; Maozu Guo; Quan Zou
Journal:  PLoS Comput Biol       Date:  2017-06-08       Impact factor: 4.475

4.  The landscape of microRNA, Piwi-interacting RNA, and circular RNA in human saliva.

Authors:  Jae Hoon Bahn; Qing Zhang; Feng Li; Tak-Ming Chan; Xianzhi Lin; Yong Kim; David T W Wong; Xinshu Xiao
Journal:  Clin Chem       Date:  2014-11-06       Impact factor: 8.327

5.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

6.  Transcriptome-wide discovery of circular RNAs in Archaea.

Authors:  Miri Danan; Schraga Schwartz; Sarit Edelheit; Rotem Sorek
Journal:  Nucleic Acids Res       Date:  2011-12-02       Impact factor: 16.971

7.  Comparison of circular RNA prediction tools.

Authors:  Thomas B Hansen; Morten T Venø; Christian K Damgaard; Jørgen Kjems
Journal:  Nucleic Acids Res       Date:  2015-12-10       Impact factor: 16.971

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

Review 9.  Biogenesis, identification, and function of exonic circular RNAs.

Authors:  Iju Chen; Chia-Ying Chen; Trees-Juen Chuang
Journal:  Wiley Interdiscip Rev RNA       Date:  2015-07-31       Impact factor: 9.957

Review 10.  Translating RNA sequencing into clinical diagnostics: opportunities and challenges.

Authors:  Sara A Byron; Kendall R Van Keuren-Jensen; David M Engelthaler; John D Carpten; David W Craig
Journal:  Nat Rev Genet       Date:  2016-03-21       Impact factor: 53.242

View more
  4 in total

Review 1.  Circular RNAs modulate Hippo-YAP signaling: functional mechanisms in cancer.

Authors:  Javeria Qadir; Feiya Li; Burton B Yang
Journal:  Theranostics       Date:  2022-05-16       Impact factor: 11.600

Review 2.  Circular RNAs in Acute Kidney Injury: Roles in Pathophysiology and Implications for Clinical Management.

Authors:  Benjamin Y F So; Desmond Y H Yap; Tak Mao Chan
Journal:  Int J Mol Sci       Date:  2022-07-31       Impact factor: 6.208

Review 3.  Phase 2 of extracellular RNA communication consortium charts next-generation approaches for extracellular RNA research.

Authors:  Bogdan Mateescu; Jennifer C Jones; Roger P Alexander; Eric Alsop; Ji Yeong An; Mohammad Asghari; Alex Boomgarden; Laura Bouchareychas; Alfonso Cayota; Hsueh-Chia Chang; Al Charest; Daniel T Chiu; Robert J Coffey; Saumya Das; Peter De Hoff; Andrew deMello; Crislyn D'Souza-Schorey; David Elashoff; Kiarash R Eliato; Jeffrey L Franklin; David J Galas; Mark B Gerstein; Ionita H Ghiran; David B Go; Stephen Gould; Tristan R Grogan; James N Higginbotham; Florian Hladik; Tony Jun Huang; Xiaoye Huo; Elizabeth Hutchins; Dennis K Jeppesen; Tijana Jovanovic-Talisman; Betty Y S Kim; Sung Kim; Kyoung-Mee Kim; Yong Kim; Robert R Kitchen; Vaughan Knouse; Emily L LaPlante; Carlito B Lebrilla; L James Lee; Kathleen M Lennon; Guoping Li; Feng Li; Tieyi Li; Tao Liu; Zirui Liu; Adam L Maddox; Kyle McCarthy; Bessie Meechoovet; Nalin Maniya; Yingchao Meng; Aleksandar Milosavljevic; Byoung-Hoon Min; Amber Morey; Martin Ng; John Nolan; Getulio P De Oliveira Junior; Michael E Paulaitis; Tuan Anh Phu; Robert L Raffai; Eduardo Reátegui; Matthew E Roth; David A Routenberg; Joel Rozowsky; Joseph Rufo; Satyajyoti Senapati; Sigal Shachar; Himani Sharma; Anil K Sood; Stavros Stavrakis; Alessandra Stürchler; Muneesh Tewari; Juan P Tosar; Alexander K Tucker-Schwartz; Andrey Turchinovich; Nedyalka Valkov; Kendall Van Keuren-Jensen; Kasey C Vickers; Lucia Vojtech; Wyatt N Vreeland; Ceming Wang; Kai Wang; ZeYu Wang; Joshua A Welsh; Kenneth W Witwer; David T W Wong; Jianping Xia; Ya-Hong Xie; Kaichun Yang; Mikołaj P Zaborowski; Chenguang Zhang; Qin Zhang; Angela M Zivkovic; Louise C Laurent
Journal:  iScience       Date:  2022-06-23

Review 4.  Research Advances in the Roles of Circular RNAs in Pathophysiology and Early Diagnosis of Gestational Diabetes Mellitus.

Authors:  Yan-Ping Zhang; Sha-Zhou Ye; Ying-Xue Li; Jia-Li Chen; Yi-Sheng Zhang
Journal:  Front Cell Dev Biol       Date:  2022-01-04
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.