| Literature DB >> 29657280 |
Enrico Gaffo1, Annagiulia Bonizzato2, Geertruy Te Kronnie3, Stefania Bortoluzzi4.
Abstract
Circular RNAs (circRNAs) are generated by backsplicing of immature RNA forming covalently closed loops of intron/exon RNA molecules. Pervasiveness, evolutionary conservation, massive and regulated expression, and posttranscriptional regulatory roles of circRNAs in eukaryotes have been appreciated and described only recently. Moreover, being easily detectable disease markers, circRNAs undoubtedly represent a molecular class with high bearing on molecular pathobiology. CircRNAs can be detected from RNAseq data using appropriate computational methods to identify the sequence reads spanning backsplice junctions that do not colinearly map to the reference genome. To this end, several programs were developed and critical assessment of various strategies and tools suggested the combination of at least two methods as good practice to guarantee robust circRNA detection. Here,we present CirComPara (http://github.com/egaffo/CirComPara), an automated bioinformatics pipeline, to detect, quantify and annotate circRNAs from RNAseq data using in parallel four different methods for backsplice identification. CirComPara also provides quantification of linear RNAs and gene expression, ultimately comparing and correlating circRNA and gene/transcript expression level. We applied our method to RNAseqdata of monocyte and macrophage samples in relation to haploinsufficiency of the RNAbinding splicing factor Quaking (QKI). The biological relevance of the results, in terms of number, types and variations of circRNAs expressed, illustrates CirComPara potential to enlarge the knowledge of the transcriptome, adding details on the circRNAome, and facilitating further computational and experimental studies.Entities:
Keywords: circular RNA; CirComPara; Quaking; RNA‐seq; bioinformatics pipeline; monocytes
Year: 2017 PMID: 29657280 PMCID: PMC5832002 DOI: 10.3390/ncrna3010008
Source DB: PubMed Journal: Noncoding RNA ISSN: 2311-553X
Figure 1(A) CirComPara workflow. Round corner boxes represent inputs; currently used tools are represented by gray labels next to the relative pipeline level; dotted lines represent optional functions; (B–D) CirComPara summary plots of circular RNAs (circRNAs) expressed; (B) absolute number of circRNAs detected by each method and (C) commonly detected by two or more methods; (D) number of circRNAs expressed per sample, considering the whole set of detected back-splices and the selected subset of circRNAs detected by at least two methods.
Figure 2CirComPara summary plots of circRNA and gene expression and integration thereof. (A,B) twin boxplots of circRNA and gene expression levels per sample; (C,D) cumulative expression plots of circRNA and gene per sample; (E) frequency distribution of number of circRNAs per gene; (F) density distribution of pairwise circRNA/gene Spearman correlation values.
The 30 most expressed circular RNAs (circRNAs) with annotations, estimated expression levels (reads per million mapped reads; RPM), enrichment group in Hansen et al. [24], validation reported in circBase, and references of studies that validated specific circRNAs. E = enriched, U = unvaried, NA = not assayed, VAL = validated.
| CircRNA ID | Overlapping Gene Ensembl ID | Overlapping Gene Symbol | CircRNA Category | QKI+− Mo (SRR2923169) | QKI++ Mo (SRR2923170) | QKI+−Ma (SRR2923171) | QKI++ Ma (SRR2923172) | RNase R Enrichment [ | Validated (CircBase) | Other Studies |
|---|---|---|---|---|---|---|---|---|---|---|
| 11:33286413-33287511:+ | ENSG00000110422 | HIPK3 | exonic | 2964 | 8861 | 12386 | 15822 | E | VAL | [ |
| 2:40428473-40430304:− | ENSG00000183023 | SLC8A1 | exonic | 2519 | 4845 | 10077 | 10604 | U | NA | |
| 12:108652272-108654410:− | ENSG00000110880 | CORO1C | exonic | 3090 | 3625 | 10701 | 10471 | E | VAL | |
| 17:20204333-20205912:+ | ENSG00000128487 | SPECC1 | exonic | 2341 | 3273 | 8062 | 8485 | U | NA | [ |
| 1:7777160-7778169:+ | ENSG00000049245 | VAMP3 | exonic | 8613 | 4941 | 4105 | 2832 | E|U | NA | |
| 4:143543509-143543972:+ | ENSG00000153147 | SMARCA5 | exonic | 4458 | 6008 | 4288 | 4863 | E|U | VAL | |
| 14:99458279-99465813:− | ENSG00000183576 | SETD3 | exonic | 5399 | 2854 | 5094 | 6114 | E | VAL | |
| 3:196391813-196403019:− | ENSG00000163960| | UBXN7|RNU6-1279P| | exonic|intergenic spanning gene | 2848 | 5594 | 5553 | 3853 | E | VAL | |
| 8:130152736-130180880:− | ENSG00000153317 | ASAP1 | exonic | 1823 | 2232 | 6744 | 6410 | E | NA | |
| 2:201145378-201149835:+ | ENSG00000003402 | CFLAR | exonic | 5294 | 3675 | 3775 | 3644 | E | NA | |
| 2:61522611-61533903:− | ENSG00000082898 | XPO1 | exonic | 8801 | 3638 | 1503 | 2329 | E | VAL | |
| 1:117402186-117420649:+ | ENSG00000198162 | MAN1A2 | exonic | 3498 | 4779 | 3812 | 4127 | U | VAL | |
| 8:130358017-130361771:− | ENSG00000153317 | ASAP1 | exonic | 0 | 250 | 8392 | 7447 | E | NA | |
| 3:149846011-149921227:+ | ENSG00000082996 | RNF13 | exonic | 1044 | 1278 | 6891 | 6829 | E|U | VAL | |
| 13:32517857-32527532:− | ENSG00000244754 | N4BP2L2 | exonic | 5770 | 3191 | 2788 | 3627 | E | VAL | |
| 18:9182382-9221999:+ | ENSG00000265257| | RP11-21J18.1|ANKRD12 | exonic | 5431 | 3236 | 3041 | 2173 | U | VAL | |
| 21:15762891-15766141:+ | ENSG00000155313 | USP25 | exonic | 3122 | 2630 | 2896 | 4360 | E | VAL | |
| 15:64499293-64500166:+ | ENSG00000180357 | ZNF609 | exonic | 2770 | 4645 | 2109 | 3334 | E | NA | [ |
| 4:152411303-152412529:− | ENSG00000109670 | FBXW7 | exonic | 3297 | 4365 | 2566 | 2339 | E | NA | |
| 4:87195324-87195690:− | ENSG00000145332 | KLHL8 | exonic | 6312 | 2664 | 2162 | 1246 | E | NA | |
| 9:110972073-110973558:− | ENSG00000198121 | LPAR1 | exonic | 2039 | 1168 | 5094 | 3715 | E | NA | [ |
| 16:85633914-85634132:+ | ENSG00000131149 | GSE1 | exonic | 3340 | 4629 | 2089 | 1859 | E | NA | |
| 4:37631385-37638504:− | ENSG00000181826 | RELL1 | exonic | 3469 | 2260 | 3336 | 2653 | E | VAL | |
| 6:4891713-4892379:+ | ENSG00000153046 | CDYL | exonic | 1048 | 1845 | 3431 | 4272 | E | NA | [ |
| 5:73074742-73077493:+ | ENSG00000157107 | FCHO2 | exonic | 763 | 1482 | 4361 | 3953 | E | NA | |
| 5:137985257-137988315:− | ENSG00000031003 | FAM13B | exonic | 3018 | 1830 | 2530 | 2753 | E | NA | |
| 14:73147795-73148094:+ | ENSG00000080815 | PSEN1 | exonic | 3881 | 2625 | 2055 | 1408 | E | NA | |
| 12:32598497-32611283:+ | ENSG00000139132 | FGD4 | exonic | 1579 | 1639 | 3884 | 2764 | E|U | NA | |
| 7:158759486-158764853:− | ENSG00000117868 | ESYT2 | exonic | 2734 | 4661 | 1026 | 1042 | E | VAL | |
| 8:37765526-37766355:+ | ENSG00000147471 | PROSC | exonic | 1980 | 2841 | 2712 | 1908 | E | VAL |
Figure 3circRNAome expression varies in relation to Quaking (QKI) haploinsufficiency during monocyte to macrophage induced differentiation. (A) number of circRNAs expressed per sample; (B) waterfall plot of log2FC in the QKI +/− vs. QKI +/+ in the two cell types, for all the expressed circRNAs; (C) number of circRNAs with absolute log2FC > 1.5 or < −1.5 when comparing QKI haploinsufficient with control cells, separately considering monocytes and macrophages; (D) QKI circular isoforms detected from RNA-seq (the table indicates for each circRNA the genomic coordinates of the back-splice ends, the expression level per sample and the intensity of observed log2FC).
List of the main software tools included in CirComPara with description of their function, and reference.
| Software Tool | Description | Citation/Website | Version |
|---|---|---|---|
| R | custom scripts | 3.2.5 (2016-04-14) | |
| Python | custom scripts | 2.7.3 | |
| Scons | script execution manager | 2.5.0 | |
| Trimmomatic | read preprocessing | [ | 0.36 |
| FASTQC | read statistics | 0.11.5 | |
| HISAT2 | linear genome mapping | [ | 2.0.4 |
| CIRCexplorer | circRNAs detection | [ | 1.1.10 |
| STAR | reads alignment by CIRCexplorer | [ | 2.5.2a |
| CIRI | circRNAs detection | [ | 2.0.2 |
| BWA | reads alignment by CIRI | [ | 0.7.15-r1140 |
| find_circ | circRNAs detection | [ | 1.2 |
| Bowtie2 | reads alignment by find_circ | [ | 2.2.9 |
| testrealign | circRNAs detection | [ | 0.1 |
| Segemehl | reads alignment by testrealign | [ | 0.2.0-418 |
| Cufflinks | gene/transcript expression quantification and transcriptome reconstruction | [ | 2.2.1 |
| BEDtools | genome coordinates comparison | [ | 2.26.0 |
| Samtools | handle alignment files; extract unmapped reads | 1.3.1 | |
| ggplot2 | R library for analysis report | 2.2.0 | |
| data.table | R library for analysis report | 1.10.0 | |
| knitr | R library for analysis report | 1.14.0 |
Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO) accession numbers, genotype, and cell type of the samples analyzed in the demonstrative analysis.
| Sample ID | GEO ID | QKI Status | Cell Type |
|---|---|---|---|
| SRR2923169 | GSM1939602 | QKI+/− | (CD14+) monocytes from peripheral blood |
| SRR2923170 | GSM1939603 | QKI+/+ | (CD14+) monocytes from peripheral blood |
| SRR2923171 | GSM1939604 | QKI+/− | differentiated CD14+ cells (macrophages) |
| SRR2923172 | GSM1939605 | QKI+/+ | differentiated CD14+ cells (macrophages) |