Literature DB >> 25234927

circBase: a database for circular RNAs.

Petar Glažar¹, Panagiotis Papavasileiou¹, Nikolaus Rajewsky².

Abstract

Recently, several laboratories have reported thousands of circular RNAs (circRNAs) in animals. Numerous circRNAs are highly stable and have specific spatiotemporal expression patterns. Even though a function for circRNAs is unknown, these features make circRNAs an interesting class of RNAs as possible biomarkers and for further research. We developed a database and website, "circBase," where merged and unified data sets of circRNAs and the evidence supporting their expression can be accessed, downloaded, and browsed within the genomic context. circBase also provides scripts to identify known and novel circRNAs in sequencing data. The database is freely accessible through the web server at http://www.circbase.org/.

Keywords: circular RNA; database; gene expression

Mesh：

Substances：
RNA, Circular
RNA

Year: 2014 PMID： 25234927 PMCID： PMC4201819 DOI： 10.1261/rna.043687.113

Source DB: PubMed Journal: RNA ISSN： 1355-8382 Impact factor: 4.942

INTRODUCTION

In recent years, several laboratories have reported thousands of circular RNAs (termed circRNAs) expressed in animal cells (Salzman et al. 2012, 2013; Jeck et al. 2013; Memczak et al. 2013). These single-stranded RNA molecules correspond to circular isoforms, often produced from exons in which the 5′ and 3′ ends are covalently closed to form a “head-to-tail” splice junction (or “backsplice”). It has been shown that in humans, circRNAs are the predominant isoform of exon-scrambling events and that circularization is a widespread and general feature of gene expression (Salzman et al. 2012). Two recent articles provided evidence that the human circRNAs (CDR1as/ciRS-7) can have regulatory function by acting as a miRNA sponge (Hansen et al. 2013; Memczak et al. 2013). However, it is still unknown if circRNAs have biological function since loss-of-function experiments are missing and, in many cases, are difficult to carry out due to the overlap between circular and linear isoforms. Nevertheless, numerous circRNAs seem well expressed (Jeck et al. 2013; Memczak et al. 2013; Salzman et al. 2013), often in a tissue-specific and developmental stage–specific manner (Memczak et al. 2013). Moreover, circRNAs are also unusually stable RNA molecules, presumably because their lack of ends prevents them from regulation by conventional RNA degradation pathways. Thus, circRNAs can be diagnostic of gene expression patterns and may constitute an interesting, novel class of biomarkers. Thousands of circRNAs in humans, the mouse, and other animals have been reported by computational analysis of sequencing data, and the numbers are likely to grow. For an overview of detection methods used, see articles by Hoffmann et al. (2014) and Jeck and Sharpless (2014). To meet the need for an integrated database that enables the study, comparison, or download of circRNAs, we developed “circBase,” with the following objectives: (1) For each circRNA, the evidence for its existence and expression should be summarized and accessible; (2) circRNAs should be presented within the genomic context and together with available expression or regulatory data; (3) a search engine should allow flexible queries, for example, by sequence, by gene, by genomic location, etc.; (4) it should be possible to intersect and download data sets in multiple ways and formats; and (5) users should be able to upload tracks with their own circRNA expression and intersect with the available data. In essence, we intend to serve the community by pooling together and making accessible published data sets from different laboratories. In its current implementation (version 1.0), circBase provides a solution to these objectives by (1) a simple searching interface that is convenient for users focusing on a small number of genes or loci; (2) various methods of bulk data retrieval; (3) merging, unifying, and annotating published data sets; and (4) linking circRNAs to the UCSC genome browser (Kent et al. 2002) and doRiNA (Anders et al. 2012), a database for post-transcriptional regulatory elements. Additionally, we (Memczak et al. 2013) have described a method for detecting circRNAs by computational analysis from RNA sequencing data. This method does not depend on preexisting genome annotation (e.g., known transcripts or splice sites). The latest version of software needed to perform this analysis is also available on circBase. The database currently hosts data from various Homo sapiens, Mus musculus, Caenorhabditis elegans, and Latimeria samples. We are expecting the range of organisms and samples to extend in the future. circBase will not cover viroids, which are already collected in other resources (Rocheleau and Pelchat 2006; Hamilton et al. 2011).

RESULTS AND DISCUSSION

Searching circBase

There are three main ways to query circBase: (1) Simple search, (2) List search, and (3) Table browser. Simple search, available from the server homepage, is intended for simple queries by identifiers, genomic location, sequence, gene description, or Gene Ontology term identifiers. Apart from internal identifiers, we provide support for RefSeq transcript identifiers, HGNC gene symbols, and names of particular circRNAs from the literature (e.g., CDR1as). We cannot exclude the possibility of different investigators using the same name for multiple circRNAs. Therefore, only circBase internal identifiers, which are based on the genomic location of a head-to-tail splice site, uniquely identify database records. In the search field, genomic locations can be given in a UCSC compatible format (e.g., chrX:134,619,072–134,756,686) or with whitespace-separated chromosome, start and end coordinates. When searching by genomic coordinates, the list of results will contain all circRNAs that overlap the specified genomic region. Search by gene description terms (e.g., “apoptosis”) will return circRNAs that are produced from transcripts of genes matching the description. Description terms are derived from official full gene names in NCBI Entrez Gene database (Maglott et al. 2011). Users can also query the database by DNA or RNA sequence, in which case the number of exact sequence matches per circRNA will be shown. This search option is limited to sequences >6 nucleotides (nt), and only exact matches of queried sequence, or its reverse complement, will be reported. Searches can be limited to a particular animal species. For the nonexact sequence search, users may use Blat (Kent 2002), available from the main menu. circBase hosts Blat references of human, mouse, and C. elegans circRNAs. circRNAs are cut opposite to the head-to-tail splice junction to keep it intact and allow looking for continuous matches that overlap the junction. List search allows the intersection of a large number of search terms with database contents. Upon selecting the organism and assembly, the user can paste or upload a list of any identifier type supported by circBase. It is also possible to submit a list of genomic regions (for example in BED format) and retrieve all overlapping circRNAs in the database. The Table browser can be used for conditional data retrieval (Fig. 1A). After selecting the organism and experiment of interest, the user can further refine the selection by a number of options, such as presence in a particular sample, range of genomic or spliced sequence lengths, number of reads supporting the head-to-tail splice junction, and many more. This interface was developed to enable simple generation of complex queries such as “Retrieve all the circRNAs from experiment A that are present in HEK293 cells, intergenic, between 1000 and 2000 bp long and do not overlap repeat sequences.”

FIGURE 1.

(A) circBase table browser. Table browser enables the user to design queries for conditional data retrieval. In this example, the user will retrieve all circRNAs from a CD34+ sample, 100–500 nt in length, that are transcribed from coding regions and do not overlap repeat sequences. (B) circBase results page. The user is presented with basic information about each circRNA that matched the search query. Columns, from left to right, contain information on the following: (1) the organism data came from; (2) genomic position; (3) the strand circRNA is transcribed from; (4) a unique identifier; (5) genomic length; (6) length of an in silico predicted spliced form; (7) list of samples the circRNA was observed in; (8) number of reads overlapping the circular junction; (9) repeat sequences overlapping the 3′ and 5′ ends that give rise to a head-to-tail splice junction; (10) circRNA annotation (annotation terms are described in documentation available on the circBase website); (11) an overlapping transcript; (12) gene symbol of the overlapping gene; and (13) a list of studies in which the particular circRNA was detected. (C) Reads mapped to a head-to-tail splice junction. Alignments of reads that support a head-to-tail splice junction can be retrieved from a single record page. Nucleotides that match the genomic reference are printed as dots, while mismatched nucleotides are represented as letters. (D) circBase single record page. In addition to data available from a results page, the user can explore more information about a particular circRNA on a single record page. (E) UCSC genome browser on doRiNA, a database for post-transcriptional regulatory elements. A region surrounding the CDR1as circRNA is shown. In addition to the basic genome browser tracks, the user can explore RNA binding protein PAR-CLIP tracks, ribosome profiling data, miRNA target prediction tracks, and much more. All search results are returned as a hyperlinked table (Fig. 1B). The table contains basic information on every circRNA that matched the search criteria, while additional information can be accessed by following the related hyperlink. Clicking the genomic position links to the local copy (Anders et al. 2012) of UCSC genome browser, while gene symbols and best transcripts are linked to the NCBI Entrez Gene (Maglott et al. 2011) and Reference Sequence (RefSeq) (Pruitt et al. 2007) databases, respectively (with the exception of C. elegans where they are linked to WormBase) (Chen et al. 2005). Data set and circRNA IDs link to pages with additional information. Apart from data visible in a results table, circRNA single-record pages (Fig. 1D) contain evidence for the occurrence of a particular circRNA, such as prediction score, read counts, alignment of reads to a head-to-tail splice junction (Fig. 1C), and the literature-based evidence for circRNAs that was verified experimentally. Genomic positions are linked to our local UCSC Genome browser, which is a part of the doRiNA database. Therefore, in addition to information retrieved from circBase, the user can explore RNA binding protein target sites, predicted miRNA binding sites and ribosome profiling tracks (Fig. 1E).

Data export

Results can be exported using options in the “Export Results” ribbon below the main menu. Results tables are available in comma-separated values (csv), tab-delimited text (tsv), and Excel Workbook (xlsx) formats, while nucleotide sequence of particular circRNAs can be downloaded as a compressed fasta file. Upon clicking the “fasta” option from the results page, the user can select between genomic or inferred spliced sequence, define a number of upstream and downstream flanking nucleotides, or download an arbitrary number of nucleotides around the 5′ and 3′ splice sites. Additional flat file export options are available from the “downloads” page.

Conclusions and future work

circBase facilitates circRNA research by (1) providing users with a single repository that collects, unifies, and annotates circRNA data in standardized formats; (2) extending public data sets with additional basic analysis; (3) integrating this data with external resources, such as the UCSC Genome Browser and NCBI databases; and (4) enabling users to interact with the database contents through a simple interface. While we developed circBase, a special emphasis was put on building a simple and self-explanatory interface, as well as writing thorough user documentation that covers all the advanced options that may not be used routinely. We intend to regularly update the database with newly published data. Direct data submission by the users is currently not supported, but the users are encouraged to contact us with requests for adding their data to circBase.

Availability

The database is freely accessible through the web server at http://www.circbase.org/.

MATERIALS AND METHODS

Implementation and contents

Our web server user interface is implemented using HTML, CSS, and JavaScript. Data are stored in a MySQL database, which is interfaced to the front end by a series of server-side Perl CGI scripts. A web browser fully compatible with HTML5 and CSS3 is required for optimal performance. We have thoroughly tested the web server in Google Chrome 31.0, Mozilla Firefox 25.0, and Internet Explorer 10.0. To the best of our knowledge, circBase contains data from all studies of large-scale circRNA identification published to date (Table 1; Jeck et al. 2013; Memczak et al. 2013; Nitsche et al. 2013; Salzman et al. 2013; Zhang et al. 2013). In addition to the information made available by the investigators, we have annotated all circRNA transcripts, predicted their putative spliced forms, and, where applicable, provided alignments of reads spanning head-to-tail junctions. Transcript reannotation was necessary to standardize the content and level the amount of information contained across some publications (Jeck et al. 2013; Salzman et al. 2013). Unique identifiers are assigned to circRNAs and will provide fixed references for future circBase releases.

TABLE 1.

Circular RNA studies available in circBase version 1.0

22 in total

1. BLAT--the BLAST-like alignment tool.

Authors: W James Kent
Journal: Genome Res Date: 2002-04 Impact factor: 9.043

2. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

3. Analysis of the transcriptome of the Indonesian coelacanth Latimeria menadoensis.

Authors: Alberto Pallavicini; Adriana Canapa; Marco Barucca; Jessica Alfőldi; Maria Assunta Biscotti; Francesco Buonocore; Gianluca De Moro; Federica Di Palma; Anna Maria Fausto; Mariko Forconi; Marco Gerdol; Daisy Monica Makapedua; Jason Turner-Meier; Ettore Olmo; Giuseppe Scapigliati
Journal: BMC Genomics Date: 2013-08-08 Impact factor: 3.969

4. Strand-specific deep sequencing of the transcriptome.

Authors: Ana P Vivancos; Marc Güell; Juliane C Dohm; Luis Serrano; Heinz Himmelbauer
Journal: Genome Res Date: 2010-06-02 Impact factor: 9.043

5. Entrez Gene: gene-centered information at NCBI.

Authors: Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal: Nucleic Acids Res Date: 2010-11-28 Impact factor: 16.971

6. An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs.

Authors: Ru Huang; Markus Jaritz; Philipp Guenzl; Irena Vlatkovic; Andreas Sommer; Ido M Tamir; Hendrik Marks; Thorsten Klampfl; Robert Kralovics; Hendrik G Stunnenberg; Denise P Barlow; Florian M Pauler
Journal: PLoS One Date: 2011-11-10 Impact factor: 3.240

7. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors: Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal: Nucleic Acids Res Date: 2006-11-27 Impact factor: 16.971

8. WormBase: a comprehensive data resource for Caenorhabditis biology and genomics.

Authors: Nansheng Chen; Todd W Harris; Igor Antoshechkin; Carol Bastiani; Tamberlyn Bieri; Darin Blasiar; Keith Bradnam; Payan Canaran; Juancarlos Chan; Chao-Kung Chen; Wen J Chen; Fiona Cunningham; Paul Davis; Eimear Kenny; Ranjana Kishore; Daniel Lawson; Raymond Lee; Hans-Michael Muller; Cecilia Nakamura; Shraddha Pai; Philip Ozersky; Andrei Petcherski; Anthony Rogers; Aniko Sabo; Erich M Schwarz; Kimberly Van Auken; Qinghua Wang; Richard Durbin; John Spieth; Paul W Sternberg; Lincoln D Stein
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

9. The Subviral RNA Database: a toolbox for viroids, the hepatitis delta virus and satellite RNAs research.

Authors: Lynda Rocheleau; Martin Pelchat
Journal: BMC Microbiol Date: 2006-03-06 Impact factor: 3.605

10. A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection.

Authors: Steve Hoffmann; Christian Otto; Gero Doose; Andrea Tanzer; David Langenberger; Sabina Christ; Manfred Kunz; Lesca M Holdt; Daniel Teupser; Jörg Hackermüller; Peter F Stadler
Journal: Genome Biol Date: 2014-02-10 Impact factor: 13.583

595 in total

1. Circular RNA of the human sphingomyelin synthase 1 gene: Multiple splice variants, evolutionary conservatism and expression in different tissues.

Authors: Ivan B Filippenkov; Olga Yu Sudarkina; Svetlana A Limborska; Lyudmila V Dergunova
Journal: RNA Biol Date: 2015 Impact factor: 4.652

2. circ-BIRC6, a circular RNA, promotes hepatocellular carcinoma progression by targeting the miR-3918/Bcl2 axis.

Authors: Guangsheng Yang; Xin Wang; Bingqi Liu; Zhihua Lu; Zongzhen Xu; Peng Xiu; Zhiqian Liu; Jie Li
Journal: Cell Cycle Date: 2019-04-16 Impact factor: 4.534

3. RNAdetector: a free user-friendly stand-alone and cloud-based system for RNA-Seq data analysis.

Authors: Alessandro La Ferlita; Salvatore Alaimo; Sebastiano Di Bella; Emanuele Martorana; Georgios I Laliotis; Francesco Bertoni; Luciano Cascione; Philip N Tsichlis; Alfredo Ferro; Roberta Bosotti; Alfredo Pulvirenti
Journal: BMC Bioinformatics Date: 2021-06-03 Impact factor: 3.169

4. Novel tumour suppressive protein encoded by circular RNA, circ-SHPRH, in glioblastomas.

Authors: S Begum; A Yiu; J Stebbing; L Castellano
Journal: Oncogene Date: 2018-04-30 Impact factor: 9.867

5. Research progress of circular RNAs in lung cancer.

Authors: Yi Ma; Xin Zhang; Yi-Zhi Wang; Hao Tian; Shun Xu
Journal: Cancer Biol Ther Date: 2018-11-07 Impact factor: 4.742

6. Novel circular RNA circNF1 acts as a molecular sponge, promoting gastric cancer by absorbing miR-16.

Authors: Zhe Wang; Ke Ma; Steffie Pitts; Yulan Cheng; Xi Liu; Xiquan Ke; Samuel Kovaka; Hassan Ashktorab; Duane T Smoot; Michael Schatz; Zhirong Wang; Stephen J Meltzer
Journal: Endocr Relat Cancer Date: 2019-03 Impact factor: 5.678

7. The RNA landscape of the human placenta in health and disease.

Authors: Gordon C S Smith; D Stephen Charnock-Jones; Sungsam Gong; Francesca Gaccioli; Justyna Dopierala; Ulla Sovio; Emma Cook; Pieter-Jan Volders; Lennart Martens; Paul D W Kirk; Sylvia Richardson
Journal: Nat Commun Date: 2021-05-11 Impact factor: 14.919

Review 8. A 360° view of circular RNAs: From biogenesis to functions.

Authors: Jeremy E Wilusz
Journal: Wiley Interdiscip Rev RNA Date: 2018-04-14 Impact factor: 9.957

9. Decreased expression of hsa_circ_0137287 predicts aggressive clinicopathologic characteristics in papillary thyroid carcinoma.

Authors: Xiabin Lan; Jun Cao; Jiajie Xu; Chao Chen; Chuanming Zheng; Jiafeng Wang; Xuhang Zhu; Xin Zhu; Minghua Ge
Journal: J Clin Lab Anal Date: 2018-05-22 Impact factor: 2.352

10. Discovery of Kaposi's sarcoma herpesvirus-encoded circular RNAs and a human antiviral circular RNA.

Authors: Takanobu Tagawa; Shaojian Gao; Vishal N Koparde; Mileidy Gonzalez; John L Spouge; Anna P Serquiña; Kathryn Lurain; Ramya Ramaswami; Thomas S Uldrick; Robert Yarchoan; Joseph M Ziegelbauer
Journal: Proc Natl Acad Sci U S A Date: 2018-11-19 Impact factor: 11.205