Literature DB >> 29036403

CSCD: a database for cancer-specific circular RNAs.

Siyu Xia1,2,3, Jing Feng4, Ke Chen5, Yanbing Ma1, Jing Gong6, Fangfang Cai1, Yuxuan Jin1, Yang Gao1, Linjian Xia1, Hong Chang1, Lei Wei1, Leng Han6, Chunjiang He1,2,3.   

Abstract

Circular RNA (circRNA) is a large group of RNA family extensively existed in cells and tissues. High-throughput sequencing provides a way to view circRNAs across different samples, especially in various diseases. However, there is still no comprehensive database for exploring the cancer-specific circRNAs. We collected 228 total RNA or polyA(-) RNA-seq samples from both cancer and normal cell lines, and identified 272 152 cancer-specific circRNAs. A total of 950 962 circRNAs were identified in normal samples only, and 170 909 circRNAs were identified in both tumor and normal samples, which could be further used as non-tumor background. We constructed a cancer-specific circRNA database (CSCD, http://gb.whu.edu.cn/CSCD). To understand the functional effects of circRNAs, we predicted the microRNA response element sites and RNA binding protein sites for each circRNA. We further predicted potential open reading frames to highlight translatable circRNAs. To understand the association between the linear splicing and the back-splicing, we also predicted the splicing events in linear transcripts of each circRNA. As the first comprehensive cancer-specific circRNA database, we believe CSCD could significantly contribute to the research for the function and regulation of cancer-associated circRNAs.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29036403      PMCID: PMC5753219          DOI: 10.1093/nar/gkx863

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Circular RNA (circRNA) is largely discovered by high-throughput sequencing (1), including many tissue-specific and cell-specific circRNAs (2,3). A few circRNAs were functional characterized in human diseases and other biological processes (4). For example, circRNA CDR1as (antisense to the cerebellar degeneration-related protein 1 transcript) was reported as miR-7 sponge and can inhibit the function of miR-7 in various cancers (5,6). CircCCDC66 serves as miRNA sponge and regulates colon cancer growth and metastasis (7). CircMTO1 acts as a sponge of oncogenic miR-9 to promote p21 (cyclin-dependent kinase inhibitor 1) expression (8). CircHIPK3 regulates cell growth by sponging 9 miRNAs (9). CircRNAs also act as RNA binding protein sponges (10,11). For example, CircFoxo3 interacts with anti-senescent protein ID-1 and transcription factor E2F1 to increase cellular senescence (12). A fusion-circRNA derived from fusion genes exerts important functions in leukemogenesis by interacting with the fusion protein (13). RNA binding proteins are also involved in the back-splicing (14). For example, RNA binding protein FUS regulates circRNA biogenesis in mouse motor neurons (15). Recent studies reported the functional impact of alternative splicing on biogenesis of circRNA (10). Modulation of splicing factor muscleblind (MBL) levels affects circMBL biosynthesis, revealing the competition between regular splicing and circularization (16). Inhibition of canonical spliceosome by mRNA splicing inhibitor reduces both the levels of circRNA and the parent linear transcript, indicating the role of mRNA spliceosome in circRNA biogenesis (17). Several circRNAs can be translated into proteins. For example, Circ-ZNF609 contains an open reading frame and is translated into a protein in murine and human myoblasts (18). CircMBL encodes a protein in fly head (19). However, the functional features of most circRNAs remained to be characterized. CircRNAs were potentially utilized as diagnostic markers (6,20–22). Recent studies identified large amount of circRNAs and constructed several databases for circRNAs. For example, Circ2Traits is compiled to link circRNAs and human diseases, including diabetes and asthma (23). CircBase integrates several circRNAs datasets into a standardized database, which allows users to explore circRNAs or download customized python scripts to identify circRNAs from their own RNA-seq data (24). CircNet provides circRNA expression profiles across hundreds of samples and illustrates circRNA-miRNA-gene regulatory networks (25). circRNADb provides the protein-coding annotations for human exonic circRNAs (26). CircInteractome explores the binding of microRNA and RNA binding proteins in circRNAs from circBase (27). However, there was no database focusing on cancer-specific circRNAs. Therefore, we collected RNA sequencing data from 87 cancer cell line samples across 19 cancers types and 141 normal cell samples from ENCODE, and constructed a cancer-specific circRNA database (CSCD, http://gb.whu.edu.cn/CSCD).

DATA COLLECTION AND DATABASE CONTENT

Cell line samples in CSCD

Previous studies showed that RNA-seq with libraries prepared by total RNA with rRNA depleted or polyA(-) enriched method (enriched for RNAs with no polyA tails (28–30)) are appropriate and efficient to characterize circRNAs (31). Therefore, we collected these RNA-seq samples from ENCODE (https://www.encodeproject.org/). In total, we collected 87 cancer cell line samples across 19 cancer types and 141 normal cell line samples (Supplementary Table S1).

Identification of cancer-specific circRNAs

To identify cancer-specific circRNAs (CS-circRNAs), four popular algorithms with high performance (32): CIRI2 (31,33), find_circ (5), circRNA_finder (34) and Circexplorer (35,36) were utilized for detecting the back-splice junction sites of circRNAs. Identification of potential splicing exons of circRNA was performed by CIRI2 (31). We included all circRNAs identified by either one of four algorithms with at least one back-splice junction read, so that the users could select circRNAs for the following experiments by their own criteria (e.g. number of junction reads). Genome assembly GRCh37 and GENCODE (version 19) gene annotation were used. We also converted circRNAs coordinates from GRCh37 to GRCh38 for users to browse. We identified 443 061 circRNAs from cancer cell lines. We then compared these circRNAs with 1 121 871 circRNAs identified from normal cell lines, and defined 272 152 CS-circRNAs (Supplementary Figure S1A). Those circRNAs identified in normal samples could be further used as non-tumor background. Interestingly, we identified more circRNAs in normal samples than tumor samples even after adjusting the samples size. This is likely due to the larger number of mapped reads in normal samples (Supplementary Figure S1B). Among these CS-circRNAs, 119 887, 105 398 and 31 575 of CS-circRNAs are located in exonic, intronic and intergenic regions, respectively. We also identified 213 882 and 11 403 of CS-circRNAs located in mRNA and lncRNA, respectively (Table 1). We identified many circRNAs which were not identified by other databases. For example, we identified 17 circRNAs for CDR1-AS, which only one was observed in circBase. This is largely due to the reason that CSCD included different samples, especially cancer samples, further suggesting the importance of a cancer-specific database.
Table 1.

Prediction of cancer-specific and normal circRNAs

Number of circRNAsexonicintronicintergenicmRNAlncRNAMRERBPORF
Cancer specific272 152119 887105 39831 575213 88211 40314 921 78815 719 824564 047
Normal950 962505 705310 87280 882789 16627 41152 417 82266 182 2102 287 210
Common170 909133 44720 39810 349150 49433519 100 34522 025 003610 840
Total1 394 023759 039436 668122 8061 153 54242 16576 439 955103 927 0373 462 097

Prediction of cellular localization

Previous work reported that most circRNAs derived from exon were identified in cytosolic (37), while those circRNAs consisting of intron or exon–intron were mainly identified in nucleus (38). To comprehensively view the cellular localization of CS-circRNAs, we extracted the cellular localization data if applicable. There are 19 228, 2107, 7020, 35 734, 37 453, 37 141 and 16 976 CS-circRNAs localized in cytosolic, insoluble cytoplasmic, membrane, chromatin, nuclear, nucleoplasmic and nucleolus, respectively.

Prediction of MRE, RBP and ORF

CircRNAs were reported to act as microRNA sponge and regulating gene expression through microRNA response elements (MREs) (39). To understand the potential regulatory functions of CS-circRNAs, 100 bp window (±50 bp) (27) surrounding the back-splicing site of each CS-circRNAs was selected to scan the potential MREs by TargetScan (40). By scanning the junction region for miRNA seeds (7mer-m8, 7mer-1a and 8mer), we identified 14 921 788 MREs in CS-circRNAs, 52 417 822 MREs in normal circRNAs and 9 100 345 MREs in common circRNAs. Another potential function of circRNAs is that circRNA serves as sponges for RNA-binding proteins (RBPs) (41). Utilizing CLIP-Seq data for protein binding sites of 37 RBPs from STARBASE (42), we identified potential RBP binding events in circRNAs. We identified 15 719 824 RBPs in CS-circRNAs, 66 182 210 in normal circRNAs and 22 025 003 in common circRNAs. Comparing with the previous database, CircInteractome, CSCD collected more circRNAs, MREs and RBPs (Supplementary Table S2), as we included more samples, especially many cancer samples. Recent studies showed the protein-encoding ability of circRNAs, which were considered as non-coding RNA before (18,19). To examine the translational potential of circRNAs, the open reading frames (ORFs) were predicted using full-length sequence of circRNA by ORF Finder. The minimal length of ORF length was set as 300nt according to a previous work (26). We identified a total of 564 047, 2 287 210 and 610 840 ORFs in cancer-specific, normal and common circRNAs, respectively.

Detection of alternative splicing events of parent genes

Alternative splicing (AS), the most frequent events in transcriptional process, may affect the biogenesis of circRNAs (31,36). To understand the relationship between alternative splicing in linear gene and biogenesis of circRNAs, we predicted alternative splicing events across all samples. RNA-seq read alignment was performed by STAR (43). Potential alternative splicing events of a linear gene, including skipped exons, alternative 5΄ splice site, alternative 3΄ splice site, mutually exclusive exons and retained introns, were detected by rMATS (44) with default parameters.

DATABASE ORGANIZATION AND WEB INTERFACE

All the data, including gene annotation, circRNAs, MRE, RBP, ORF, AS associated with circRNAs were organized into a set of interactive MySQL tables. ThinkPHP, an open-source web framework based on PHP (https://github.com/top-think) and JavaScript library were used to construct the CSCD database. To make the data query convenient and efficient, we organized our database into three sub-databases by data type (cancer-specific, normal and common). The web interface of CSCD is summarized in Figure 1. The main page of CSCD is composed of three panels.
Figure 1.

Overview of CSCD. (A) Query panel of the circRNA. CircRNA can be viewed and searched by sample name, gene symbol and circRNA ID, etc. (B) Gene panel. Image and information of gene, transcript, alternative splicing and circRNAs are displayed on this panel. (C) CircRNA panel. Image and information of circRNA and related location of MRE, RBP and ORF are displayed on this panel.

Overview of CSCD. (A) Query panel of the circRNA. CircRNA can be viewed and searched by sample name, gene symbol and circRNA ID, etc. (B) Gene panel. Image and information of gene, transcript, alternative splicing and circRNAs are displayed on this panel. (C) CircRNA panel. Image and information of circRNA and related location of MRE, RBP and ORF are displayed on this panel.

Query Panel to search/browse circRNAs

In this panel, users can browse circRNAs by selecting sample type, sample name, gene symbol, cellular localization and search circRNA ID (e.g. chrX:18928998|18938303, which represents the donor and acceptor site of each circRNA) or gene symbol in searching box. All information, including the parent gene, sample type, circRNA ID, UCSC genome browser link (45), sample source, genomic coordinates, lncRNA/mRNA annotation, ratio of circRNA/linear RNA, spliced exons, circBase ID, cellular localization, identification algorithm, number of junction reads and log2 SRPTM (number of circular reads/number of mapped reads (units in trillion)/read length) (2) for each circRNA are displayed in the table (Figure 1A). Gene symbol links to the Gene Panel with all circRNAs across different samples (Figure 1B). CircRNA ID links to the circRNA Panel with a circRNA in a specific sample (Figure 1C).

Gene panel to view all circRNAs for selected gene

In this panel, users can view the circRNAs and their linear parent gene in Overview tab. Linear gene structures are displayed with different colored rectangles for exons and black lines for introns, while circRNAs are shown as colorful curves. All transcripts and potential alternative splicing events of this linear parent gene are also displayed below. Users can zoom in for a high-resolution image by clicking the top right corner of the panel. All the detailed information is listed in gene tab, transcript tab, circRNA tab and splicing tab (Figure 1B). The circRNA curve links to specific circRNA in circRNA panel (Figure 1C).

CircRNA panel to view specific circRNA

In this panel, users can view selected circRNAs with consisting exons in a colored circle (Figure 1C). Each arc with number ID depicts one exon, while introns are displayed in black lines. Users can also view the number and position of MRE (red triangle), RBP (blue rectangle) and ORF (green arc) elements located in circRNA and check the detailed information through the circRNA, MRE, RBP and ORF tabs, respectively. Users can zoom in for a high-resolution image by clicking the top right corner of the panel.

SUMMARY AND FUTURE DIRECTIONS

CSCD collects available RNASeq datasets in cancer and normal samples, and provides an integrated circRNA database to benefit functional studies of cancer-specific circRNAs. CSCD also collects normal and common circRNAs, which allow users for other studies. For example, users can compare circRNAs in their cancer samples with CSCD to examine whether those circRNAs are cancer-specific or not. Users can view the potential functional regulation and translation on these circRNAs through prediction of MRE, RNA binding protein and open reading frame. Users can also link the back-splicing and alternative splicing of linear genes through splicing events prediction. Due to the limited RNA-seq datasets for total RNA with rRNA depleted or polyA (-) in primary tumor samples, we did not include any primary tumor samples. We will update CSCD when more sequencing data from primary samples are available. Click here for additional data file.
  45 in total

1.  rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.

Authors:  Shihao Shen; Juw Won Park; Zhi-xiang Lu; Lan Lin; Michael D Henry; Ying Nian Wu; Qing Zhou; Yi Xing
Journal:  Proc Natl Acad Sci U S A       Date:  2014-12-05       Impact factor: 11.205

Review 2.  Circular RNA: A new star of noncoding RNAs.

Authors:  Shibin Qu; Xisheng Yang; Xiaolei Li; Jianlin Wang; Yuan Gao; Runze Shang; Wei Sun; Kefeng Dou; Haimin Li
Journal:  Cancer Lett       Date:  2015-06-05       Impact factor: 8.679

3.  Circular RNA: a novel biomarker for progressive laryngeal cancer.

Authors:  Lijia Xuan; Lingmei Qu; Han Zhou; Peng Wang; Haoyang Yu; Tianyi Wu; Xin Wang; Qiuying Li; Linli Tian; Ming Liu; Yanan Sun
Journal:  Am J Transl Res       Date:  2016-02-15       Impact factor: 4.060

4.  Detecting and characterizing circular RNAs.

Authors:  William R Jeck; Norman E Sharpless
Journal:  Nat Biotechnol       Date:  2014-05       Impact factor: 54.908

5.  Circular RNAs are a large class of animal RNAs with regulatory potency.

Authors:  Sebastian Memczak; Marvin Jens; Antigoni Elefsinioti; Francesca Torti; Janna Krueger; Agnieszka Rybak; Luisa Maier; Sebastian D Mackowiak; Lea H Gregersen; Mathias Munschauer; Alexander Loewer; Ulrike Ziebold; Markus Landthaler; Christine Kocks; Ferdinand le Noble; Nikolaus Rajewsky
Journal:  Nature       Date:  2013-02-27       Impact factor: 49.962

6.  CIRI: an efficient and unbiased algorithm for de novo circular RNA identification.

Authors:  Yuan Gao; Jinfeng Wang; Fangqing Zhao
Journal:  Genome Biol       Date:  2015-01-13       Impact factor: 13.583

7.  Circular RNA profiling reveals an abundant circHIPK3 that regulates cell growth by sponging multiple miRNAs.

Authors:  Qiupeng Zheng; Chunyang Bao; Weijie Guo; Shuyi Li; Jie Chen; Bing Chen; Yanting Luo; Dongbin Lyu; Yan Li; Guohai Shi; Linhui Liang; Jianren Gu; Xianghuo He; Shenglin Huang
Journal:  Nat Commun       Date:  2016-04-06       Impact factor: 14.919

8.  FUS affects circular RNA expression in murine embryonic stem cell-derived motor neurons.

Authors:  Lorenzo Errichelli; Stefano Dini Modigliani; Pietro Laneve; Alessio Colantoni; Ivano Legnini; Davide Capauto; Alessandro Rosa; Riccardo De Santis; Rebecca Scarfò; Giovanna Peruzzi; Lei Lu; Elisa Caffarelli; Neil A Shneider; Mariangela Morlando; Irene Bozzoni
Journal:  Nat Commun       Date:  2017-03-30       Impact factor: 14.919

9.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data.

Authors:  Jun-Hao Li; Shun Liu; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

10.  Integrator mediates the biogenesis of enhancer RNAs.

Authors:  Fan Lai; Alessandro Gardini; Anda Zhang; Ramin Shiekhattar
Journal:  Nature       Date:  2015-08-26       Impact factor: 49.962

View more
  145 in total

Review 1.  Functional role of circular RNAs in cancer development and progression.

Authors:  Wei Lun Ng; Taznim Begam Mohd Mohidin; Kirti Shukla
Journal:  RNA Biol       Date:  2018-08-04       Impact factor: 4.652

Review 2.  Circular RNAs: A Novel Class of Functional RNA Molecules with a Therapeutic Perspective.

Authors:  Laura Santer; Christian Bär; Thomas Thum
Journal:  Mol Ther       Date:  2019-07-09       Impact factor: 11.454

Review 3.  Circular RNAs in digestive system cancer: potential biomarkers and therapeutic targets.

Authors:  Jia-Qi Sheng; Lian Liu; Mu-Ru Wang; Pei-Yuan Li
Journal:  Am J Cancer Res       Date:  2018-07-01       Impact factor: 6.166

4.  Bioinformatic Analysis of Circular RNA Expression.

Authors:  Enrico Gaffo; Alessia Buratin; Anna Dal Molin; Stefania Bortoluzzi
Journal:  Methods Mol Biol       Date:  2021

5.  Circular RNA profile of parathyroid neoplasms: analysis of co-expression networks of circular RNAs and mRNAs.

Authors:  Ya Hu; Xiang Zhang; Ming Cui; Mengyi Wang; Zhe Su; Quan Liao; Yupei Zhao
Journal:  RNA Biol       Date:  2019-06-18       Impact factor: 4.652

6.  CCRDB: a cancer circRNAs-related database and its application in hepatocellular carcinoma-related circRNAs.

Authors:  Qingyu Liu; Yanning Cai; Haiquan Xiong; Yiyun Deng; Xianhua Dai
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

7.  CropCircDB: a comprehensive circular RNA resource for crops in response to abiotic stress.

Authors:  Kai Wang; Chong Wang; Baohuan Guo; Kun Song; Chuanhong Shi; Xin Jiang; Keyi Wang; Yacong Tan; Lequn Wang; Lin Wang; Jiangjiao Li; Ying Li; Yu Cai; Hongwei Zhao; Xiaoyong Sun
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

8.  Identification of candidate targets for the diagnosis and treatment of atherosclerosis by bioinformatics analysis.

Authors:  Yan Gu; Xiao Ma; Jing Li; Yuhong Ma; Yun Zhang
Journal:  Am J Transl Res       Date:  2021-05-15       Impact factor: 4.060

9.  circRNA Expression Profiles in Human Bone Marrow Stem Cells Undergoing Osteoblast Differentiation.

Authors:  Mengjun Zhang; Lingfei Jia; Yunfei Zheng
Journal:  Stem Cell Rev Rep       Date:  2019-02       Impact factor: 5.739

10.  Phenotype-genotype network construction and characterization: a case study of cardiovascular diseases and associated non-coding RNAs.

Authors:  Rongrong Wu; Yuxin Lin; Xingyun Liu; Chaoying Zhan; Hongxin He; Manhong Shi; Zhi Jiang; Bairong Shen
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.