| Literature DB >> 29581865 |
Lucia Coscujuela Tarrero1,2, Giulio Ferrero1,2,3, Valentina Miano1,2, Carlo De Intinis2, Laura Ricci1,2, Maddalena Arigoni4, Federica Riccardo4, Laura Annaratone5, Isabella Castellano5, Raffaele A Calogero1,4, Marco Beccuti3, Francesca Cordero1,3, Michele De Bortoli1,2.
Abstract
Circular RNAs are highly stable molecules present in all eukaryotes generated by distinct transcript processing. We have exploited poly(A-) RNA-Seq data generated in our lab in MCF-7 breast cancer cells to define a compilation of exonic circRNAs more comprehensive than previously existing lists. Development of a novel computational tool, named CircHunter, allowed us to more accurately characterize circRNAs and to quantitatively evaluate their expression in publicly available RNA-Seq data from breast cancer cell lines and tumor tissues. We observed and confirmed, by ChIP analysis, that exons involved in circularization events display significantly higher levels of the histone post-transcriptional modification H3K36me3 than non-circularizing exons. This result has potential impact on circRNA biogenesis since H3K36me3 has been involved in alternative splicing mechanisms. By analyzing an Ago-HITS-CLIP dataset we also found that circularizing exons overlapped with an unexpectedly higher number of Ago binding sites than non-circularizing exons. Finally, we observed that a subset of MCF-7 circRNAs are specific to tumor versus normal tissue, while others can distinguish Luminal from other tumor subtypes, thus suggesting that circRNAs can be exploited as novel biomarkers and drug targets for breast cancer.Entities:
Keywords: alternative splicing; biomarker; breast cancer; circRNA; estrogen receptor
Year: 2018 PMID: 29581865 PMCID: PMC5865691 DOI: 10.18632/oncotarget.24522
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1(A) Schematic representation of the computational pipeline applied for our prediction of CircRNAs expressed in MCF-7 cell lines (CM7). BS = back-splicing. (B) Bar plot represents the number of circRNAs predicted in ENCODE MCF-7 Poly(A)+, Poly(A)-, Total RNA-seq, and in this study (CM7). The dark red color represents the fraction of circRNAs predicted belonging to the CM7 set. (C) Box plot showing the log2 average number of BS reads supporting the circRNAs selected (red) or filtered (pink) in our analysis. (D) Venn diagram shows the overlap between circRNAs predicted in this study and the annotations from circBase and circRNAdb. (E) Manhattan plot shows the number of circRNAs predicted on individual human chromosome. The identifiers of circRNAs supported by more than 100 reads are reported. (F) Bar plot represents the RNase R resistance of a set of circRNA predicted in our analysis (red) and of the linear transcripts generated from the corresponding host gene (blue). Standard deviations are computed from five independent biological replicates. p-value from Student t test. *** = p-value < 0.001; *= p-value < 0.01; **= p-value < 0.05.
Figure 2(A) Histogram shows the distribution of the number of circRNAs produced by the circRNA host gene identified in our analysis. (B) Density plot shows the relative rank of exons involved in circularization. Dark red color represents the 5ʹ Circularizing Exon (5ʹCE) while light red the 3ʹ Circularizing Exon (3ʹCE). (C) Box plot shows the gene length distribution of host genes (red), control genes (blue), and random set genes (grey); p-value by Wilcoxon Rank-Sum test. *** = p-value < 0.001. (D) Box plot shows the number of isoforms of host genes (red), control genes (Ctrl, blue), random genes (Rnd, grey), and control genes paired with host genes by the first intron length (Ctrl-I, cyan); p-value by Wilcoxon Rank-Sum test. *** = p-value < 0.001. (E) Box plot shows the length of the first intron of host genes (red), control genes (blue), and random set genes (grey); p-value by Wilcoxon Rank-Sum test. *** = p-value < 0.001. (F) Line plot represents the MCF-7 H3K36me3, Pol II, H3K4me3, and H3K27ac ChIP-seq signal profile measured in a genomic window of +/− 1 kb centered the 5ʹ BS junction (left) and the 3ʹ BS junction (right). (G) Box plot shows the number of normalized H3K36me3 ChIP-Seq reads counted in the first four exons of host (red) and control genes (blue); p-value by Wilcoxon Rank-Sum test. *** = p-value < 0.001. (H) Genome Browser representation of the genomic regions involved in the formation of Circ_ZKSCAN1_2-3 circRNA. The H3K36me3 genomic coverage is reported in red. The coverage values are reported as read per million sequenced reads. (I) Bar plot represents H3K36me3 ChIP enrichment at the selected exons of five circRNA host genes. In red, the exon involved in the circularization is represented while the flanking exons are represented in blue. The HSDP gene was used as representative control gene since it was characterized by a similar level of expression in MCF-7 compared to the five analyzed circRNA host genes. The exons two, three, and four were selected for this gene. Bars indicate standard deviation from three independent biological replicates; ** = p-value < 0.01 and * = p-value < 0.05.
Figure 3(A) Heat map represents the normalized number of BS junction reads of circRNAs differentially expressed in ER+ versus ER- BC cell lines. (B) Heat map represents the expression level of a set of 28 validated circRNAs. The expression level was measured by qRT-PCR and is relative to the lowest expressed circRNA in each cell line and is reported in light-to-dark blue color-scale.
Figure 4(A) Heat map represents the number of BS junction reads of circRNAs significantly Differentially Expressed (DE) in ER+ tumors versus Normal Breast Organoids (NBO) (left), ER+ tumors versus HER2 amplified tumors (middle), or ER+ tumor samples versus Triple Negative (TN) tumors (right). (B) Bar plot shows the log2 Fold Change of the ten most significant upregulated or downregulated DE circRNAs in each DE analysis performed (red). In blue are represented their correspondent linear genes. In bold circRNAs whose host genes are known to be DE in a specific condition. (C) Radar plot represents the number of BS reads of three representative DE circRNAs, counted by HashCirc in 20 total RNA-Seq datasets from primary tumor specimens (ER = ER+ tumors; HER2 = HER2 amplified tumors; NBO = Normal Breast Organoids). (D) Heat map represents the expression level of ten selected circRNAs measured by qRT-PCR in 42 different tumor samples, (left) and the level of expression of the corresponding host genes (right). The tumor molecular subtype and the positivity to ER expression or HER2 amplification is reported on top of the heat maps, accordingly to clinical data.
Figure 5(A) Line plot shows the normalized number of Ago-HITS-CLIP reads counted in a genomic window of +/− 500 bp centered on each circRNA BS sites (left) or at corresponding linear splicing sites of genes from the control gene set (right). Data of all the three biological replicates of the experiment are reported. CE, Circularizing exon. (B) Box plot represents the number of Ago-HITS-CLIP reads counted within a genomic region of +/− 100 bp centered on each circRNA BS sites (red) or control gene splicing sites (blue). p-value by Wilcoxon-Rank Sum Test. ***, p-value < 0.001. (C) Washu Genome Browser representation of the circRNA predicted at ZNF91 gene. The normalized genomic coverage of the three Ago-HITS-CLIP experiments is reported at the bottom. The signal was normalized on the total number of reads sequenced in each experiment. (D) Histogram representing the number of sequences overlapped with Ago-HITS-CLIP reads considering the CM7 BS junctions (red) and 100 sets of 3,271 sequences generated by random permutation of CM7 BS sequences halves (blue). (E) Box plot representing the log10 average reads in three different datasets: 127 AGO-overlapped back-splicing sequences (red), 100 sets of random sequences generated by randomly permuting 127 randomly selected from the 3,271 CM7 back-splicing sequence halves (Permuted Set 1, blue) and 100 sets of random sequences generated by permuting the 127 back-splicing sequences overlapped with AGO (Permuted Set 2, blue) p-value by Wilcoxon-Rank Sum Test. ***, p-value < 0.001. (F) Bar plot showing the average normalized Ago reads overlapped with the top 10 CM7 BS junction (red) and a set composed of linear mRNA splicing sequences in between of the two exons involved in the circularization (light blue). The internal junction was defined by considering the 5’ CE and the following exon in the linear transcript. For Circ_RPPH1_1, mapped on a monoexonic gene, no control junction was available. For the monoexonic circRNA Circ_EXO5_3 no internal junction was available. The number of reads counted on the linear mRNA splicing sequences involving the CE and the flanking exon at the 5’ (cyan) and the flanking exon at the 3’ (blue) were also reported.