Literature DB >> 35668371

CircPrimer 2.0: a software for annotating circRNAs and predicting translation potential of circRNAs.

Shanliang Zhong1, Jifeng Feng2.   

Abstract

BACKGROUND: Some circular RNAs (circRNAs) can be translated into functional peptides by small open reading frames (ORFs) in a cap-independent manner. Internal ribosomal entry site (IRES) and N6-methyladenosine (m6A) were reported to drive translation of circRNAs. Experimental methods confirming the presence of IRES and m6A site are time consuming and labor intensive. Lacking computational tools to predict ORFs, IRESs and m6A sites for circRNAs makes it harder.
RESULTS: In this report, we present circPrimer 2.0, a Java based software for annotating circRNAs and predicting ORFs, IRESs, and m6A sites of circRNAs. circPrimer 2.0 has a graphical and a command-line interface that enables the tool to be embed into an analysis pipeline.
CONCLUSIONS: circprimer 2.0 is an easy-to-use software for annotating circRNAs and predicting translation potential of circRNAs, and freely available at www.bio-inf.cn .
© 2022. The Author(s).

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35668371      PMCID: PMC9169404          DOI: 10.1186/s12859-022-04705-y

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.307


Background

Circular RNAs (circRNAs) are a family of regulatory RNAs with loop structures which implies they do not have 5`Caps and 3` Poly (A) tails [1]. Although a great number of circRNAs have been identified, their functions are still largely unknown. CircRNAs are generally considered noncoding RNAs with various biological functions. Up to now, the vast majority of studies that investigated function of circRNAs have been based around the miRNA-sponge activity of these molecules [2]. Nevertheless, some studies reported that circRNAs can be translated into functional peptides by small open reading frames (ORFs) [3]. Since circRNAs do not have 5` Caps, circRNAs cannot be translated in a cap-dependent manner. Two mechanisms have been reported to initiate translation of circRNAs. First, internal ribosomal entry site (IRES) recruits ribosomes to the internal site of circRNA to initiate translation [3]. Second, N6-methyladenosine (m6A) drives translation with the help of initiation factor eIF4G2 and m6A reader YTHDF3 [4, 5]. Therefore, the existence of ORF and IRES or m6A site is a prerequisite to encode peptides for a circRNA. However, experimental methods confirming the presence of IRESs and m6A modification sites are time consuming and labor intensive [5, 6]. Lacking computational tools to predict IRESs and m6A sites as well as ORFs for circRNAs makes it harder. At present, no tool predicts ORFs, IRESs or m6A modification sites specificity for circRNAs. Here, we present circPrimer 2.0, a user-friendly software to help researchers study circRNAs. We rewrote all codes of former version of circPrimer [7]. CircPrimer 2.0 includes all features of former version, with optimized performance. Besides annotating circRNAs and determining specificity of circRNA primers, circPrimer 2.0 can show conserved circRNAs, and predict ORFs, IRESs and m6A modification sites. The results are presented visually and can be saved as PDF format. CircPrimer 2.0 also provides command-line interface, therefore it can be integrated into analysis pipelines.

Implementation

Prediction of ORFs

To predict ORFs for a circRNA, the start codons and stop codons are searched for each frame. When two or more start codons are found in the upstream of a stop codon in a frame, we choose the one far from the stop codon as the start codon. Studies have reported that circRNA containing an infinite ORF can be efficiently translated to produce a long-repeating peptide sequence [8, 9], thus we also predict infinite ORFs. The accuracy of ORF prediction were evaluated using ORFfinder (Linux × 64; www.ncbi.nlm.nih.gov/orffinder/). There are two situations in predicting ORFs for circRNAs. The first one is that the sequence length of a circRNA can be evenly divided by three. Figure 1a presents an example of this type of circRNA. In this situation, the frame will not shift in rolling circle translation. If there is a stop codon in a frame, the maximum length of an ORF is equal to circRNA length. If an infinite ORF is found in a frame, the frame may produce a long-repeating peptide sequence in a manner of rolling circle translation (Fig. 1a). The full sequence of the circRNA from the start codon down to the terminal codon comprises one rolling circle translation.
Fig. 1

The methods used to predict open reading frames (ORFs) and internal ribosomal entry sites (IRESs). a Predicting ORFs for a circRNA with a length that can be evenly divided by three. b Predicting ORFs for a circRNA with a length that cannot be evenly divided by three

The methods used to predict open reading frames (ORFs) and internal ribosomal entry sites (IRESs). a Predicting ORFs for a circRNA with a length that can be evenly divided by three. b Predicting ORFs for a circRNA with a length that cannot be evenly divided by three The second situation is that the circRNA length cannot be evenly divided by three. When an ORF spans the back-spliced junction, the frame will shift. When there is a stop codon in a frame, the maximum length of an ORF in this frame is equal to 3-folds of circRNA length (Fig. 1b). If an infinite ORF is found in a frame, the length of one repeat sequence from the start codon down to the terminal codon is also equal to 3-folds of circRNA length.

Prediction of IRES

To predict IRES, we used TGBoost package (https://github.com/wepe/tgboost) to build the models for IRES predication with the 20,872 native IRES sequences reported by Gritsenko et al. [10]. Wang et al. have demonstrated that using global kmer features only can obtain high prediction performance [6], thus we established our models using global kmer features. We randomly divided the data into training (90%) and test dataset (10%) and used tenfold cross validation to evaluate each combination of parameters. The best fit parameters were summarized to generate the final set of model parameters. Wang et al. divided the kmer count by the sequence length to remove the influence of sequence length [6]. However, we found that an IRES in a long sequence will obtain a negative result. That is because the kmer features are diluted by the long none IRES sequence. Therefore, we split the full circRNA sequence into fragments of 174 nt, which is equal to Gritsenko et al.’ data [10]. The step used to split the sequence is 20 nt, i.e. every two consecutive fragments with a 154-base overlap. Then the kmer frequencies are calculated for each fragment. If 2 or more fragments are predicted as IRES, the IRES near the start codon is considered as the IRES of an ORF. It should be noted that a positive result does not mean the 174 nt fragment is IRES but the fragment contains an IRES. Command-line interface can be used to predict IRESs with shorter fragments.

Dataset of m6A modification sites

We downloaded m6A modification sites for Human and Mouse from m6A-Atlas [11]. m6A-Altas is a comprehensive knowledgebase for unraveling the m6A epitranscriptome, which features a high-confidence collection of reliable m6A sites identified from seven base-resolution technologies and the quantitative condition-specific epitranscriptome profiles estimated from high-throughput sequencing samples. Because the reference genome of the m6A sites is hg19 for Human and GRCm38.p6 for Mouse, we transformed hg19 to hg38 and GRCm38.p6 to mm9 using Remap (www.ncbi.nlm.nih.gov/genome/tools/remap) for genomic locations in hg38 and mm9. Because m6A modification occurs within the consensus DRACH/RRACH motifs [5, 12, 13], circPrimer 2.0 shows all DRACH/RRACH motifs (D = A, G or U; R = G or A; H = A, C or U) for a sequence.

Identification of homeotic circRNA

We identified homeotic circRNAs between Homo sapiens and Mus musculus using the following criteria: (1) The circRNAs are derived from same gene; (2) Their sequence length is identical; and (3) The identity of their sequences is larger than 80%.

Results

Features of circPrimer 2.0

CircPrimer 2.0 is written in Java and provides both a graphical and command-line interface. Compared with circPrimer 1.2, circPrimer 2.0 can (1) Predict ORFs and IRESs for all circRNAs with their sequences; (2) Be integrated into analysis pipelines; (3) Show conserved circRNAs and identities between Homo sapiens and Mus musculus; (4) Run in all platforms, including Window, Mac OS X, Linux, and Solaris; (5) Search and annotate circRNAs more quickly; (6) Export data in different formats, Fasta, txt, or csv; (7) Save figures in PDF format; and (8) Search and annotate circRNAs of Mus musculus. Because we used cloud database to store our data, the size of circPrimer is compressed from 3G to 4 M.

Evaluating ORF prediction accuracy

We randomly selected 1000 sequences from circBase, and predicted ORFs using ORFfinder and circPrimer 2.0. Because ORFfinder is unable to predict ORFs for circRNAs, their results cannot be compared directly. First, we removed the ORFs spanning the back-spliced junctions for circPrimer 2.0. Second, we filtered the ORFs without a stop codon for ORFfinder. Third, we compared the rest ORFs with each other. We found that the rest ORFs of circPrimer 2.0 were identical to those of ORFfinder (Additional file 1: Data S1). Because Legnini et al. reported that a start codon, in the same frame, presented in the downstream of the first one can also drive translation [14], circPrimer 2.0 highlights these inner start codons with green background (Fig. 2).
Fig. 2

Predicted open reading frames (ORFs) and internal ribosomal entry sites (IRESs) are shown visually. ORF4 ~ ORF6 are infinite ORFs which lack a stop codon and are labeled with “a number × n” in the Length field. The number is the length of one repeat sequence. Green background, start codons in the same frame of the ORF; red background, stop codon. Red font in IRES field, IRESs spanning back-spliced junctions

Predicted open reading frames (ORFs) and internal ribosomal entry sites (IRESs) are shown visually. ORF4 ~ ORF6 are infinite ORFs which lack a stop codon and are labeled with “a number × n” in the Length field. The number is the length of one repeat sequence. Green background, start codons in the same frame of the ORF; red background, stop codon. Red font in IRES field, IRESs spanning back-spliced junctions

Building models for IRES prediction and performance evaluation

The tuning parameters of TGBoost model showed that the optimal parameters are eta = 0.03, max_depth = 5, scale_pos_weight = 8.78, subsample = 0.9, colsample_bytree = 0.5, min_child_weight = 19, gamma = 0, lamda = 1, alpha = 0. To test accuracy in circRNAs, we searched PubMed for the studies reported coding circRNAs and obtained 10 human circRNAs [14-23]. Because one study did not reported detailed information, we have failed to obtain their circRNA sequence [16]. Another study did not assess the translation initiation mechanisms [21]. Therefore, the two studies were removed. We used circPrimer 2.0 to predict ORFs for the rest 8 circRNAs. All ORFs were predicted by circPrimer 2.0. When predicting IRESs, it failed to find an IRES site in 3 circRNAs [15, 17, 22], and predicted at least one IRES in the other 5 circRNAs, showing a sensitivity of 63% (Additional file 2: Table S1). We also assessed the performance of the model using test dataset. The accuracy predicting IRESs is 74.1%, sensitivity is 64.8% and specificity is 75.1%.

Showing the predicted ORFs, IRESs and m6A modification sites

After searching or annotating circRNAs or checking primers, the circRNAs will be listed in the middle panel. When clicking one item, a dialog will show the circRNA structure. If you set ComboBox as “ORFs”, a right panel will show the predicted ORFs and IRESs. “None” in the field of IRES means none IRES is found in this circRNA; otherwise, the positions of IRESs are shown. Because IRESs spanning back-spliced junctions may exhibit a splicing dependent IRES activity [24], circPrimer 2.0 highlights these IRESs with red font (Fig. 2). You can click one item to show an ORF and its IRES visually as well as their detailed information (Fig. 2). If you select two or more items, only ORFs are shown visually. To indicate an infinite ORF, the length of the ORF will be labeled with “a number × n”. The number is the length of one repeat sequence (Fig. 2). The panel of “ORF and m6A” shows the m6A modification sites.

Showing homeotic gene

After comparing circRNA sequences between Homo sapiens and Mus musculus, we obtained 3439 paired conserved circRNAs. The conserved circRNAs are shown in red font in the middle panel. When clicking one conserved circRNA, the identity of the sequences between Homo sapiens and Mus musculus will show in the right-bottom textarea.

Discussion

In the present study, we present circPrimer 2.0, a Java based software for annotating circRNAs and predicting ORFs and IRESs of circRNAs. At present, circRNADb and Circbank had predicted ORFs and IRESs for a number of circRNAs [25, 26]. Because Circbank only shows the ORF size and locations of IRESs, users are unable to obtain ORF locations or sequences, and users have to extract IRES sequences manually. In addition, Circbank used IRESfinder [27] to predict IRESs, which has been reported to have some obvious shortcomings [6]. circRNADb predicted IRESs using VIPS, a tool for predicting viral IRESs [28]. Both tools are unable to show ORFs and IRESs for novel circRNAs. Therefore, circPrimer 2.0 is the first tool specifically designed to predict ORFs and IRESs of circRNAs.

Conclusions

We demonstrated the reliability of circPrimer 2.0 in predicting ORFs and IRESs. CircPrimer 2.0 shows the positions of ORFs, IRESs and m6A sites visually. Users can perform the predication with preferred parameters using command-line interface. CircPrimer 2.0 shows conserved circRNAs and identities between Homo sapiens and Mus musculus. In summary, circPrimer 2.0 is an easy-to-use software annotating circRNAs and predicting translation potential of circRNAs.

Availability and requirements

Project name: circPrimer 2.0 Project home page: www.bio-inf.cn Operating system(s): Window, Mac OS X, Linux, and Solaris Programming language: Java Other requirements: Internet connectivity and Java 1.8.0 or higher License: GNU General Public License version 3.0 (GPL-3.0) Any restrictions to use by non-academics: None. Additional file 1: Python and R scripts as well as the data generated to evaluate the accuracy of circPrimer 2.0 in predicting open reading frames. Additional file 2: Testing accuracy of circPrimer 2.0 in predicting IRESs and ORFs in reported coding circRNAs.
  28 in total

Review 1.  How are circRNAs translated by non-canonical initiation mechanisms?

Authors:  Leïla Halidou Diallo; Florence Tatin; Florian David; Anne-Claire Godet; Audrey Zamora; Anne-Catherine Prats; Barbara Garmy-Susini; Eric Lacazette
Journal:  Biochimie       Date:  2019-06-29       Impact factor: 4.079

2.  Identification of internal control genes for circular RNAs.

Authors:  Shanliang Zhong; Siying Zhou; Sujin Yang; Xinnian Yu; Hanzi Xu; Jinyan Wang; Qian Zhang; Mengmeng Lv; Jifeng Feng
Journal:  Biotechnol Lett       Date:  2019-08-19       Impact factor: 2.461

3.  IRESfinder: Identifying RNA internal ribosome entry site in eukaryotic cell using framed k-mer features.

Authors:  Jian Zhao; Jing Wu; Tianyi Xu; Qichang Yang; Junhao He; Xiaofeng Song
Journal:  J Genet Genomics       Date:  2018-07-18       Impact factor: 4.275

4.  m6A-dependent biogenesis of circular RNAs in male germ cells.

Authors:  Chong Tang; Yeming Xie; Tian Yu; Na Liu; Zhuqing Wang; Rebekah J Woolsey; Yunge Tang; Xinzong Zhang; Weibing Qin; Ying Zhang; Ge Song; Weiwei Zheng; Juan Wang; Weitian Chen; Xiongyi Wei; Zhe Xie; Rachel Klukovich; Huili Zheng; David R Quilici; Wei Yan
Journal:  Cell Res       Date:  2020-02-11       Impact factor: 25.617

5.  A novel protein encoded by the circular form of the SHPRH gene suppresses glioma tumorigenesis.

Authors:  Maolei Zhang; Nunu Huang; Xuesong Yang; Jingyan Luo; Sheng Yan; Feizhe Xiao; Wenping Chen; Xinya Gao; Kun Zhao; Huangkai Zhou; Ziqiang Li; Liu Ming; Bo Xie; Nu Zhang
Journal:  Oncogene       Date:  2018-01-18       Impact factor: 9.867

6.  circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations.

Authors:  Xiaoping Chen; Ping Han; Tao Zhou; Xuejiang Guo; Xiaofeng Song; Yan Li
Journal:  Sci Rep       Date:  2016-10-11       Impact factor: 4.379

7.  Circ-ZNF609 Is a Circular RNA that Can Be Translated and Functions in Myogenesis.

Authors:  Ivano Legnini; Gaia Di Timoteo; Francesca Rossi; Mariangela Morlando; Francesca Briganti; Olga Sthandier; Alessandro Fatica; Tiziana Santini; Adrian Andronache; Mark Wade; Pietro Laneve; Nikolaus Rajewsky; Irene Bozzoni
Journal:  Mol Cell       Date:  2017-03-23       Impact factor: 17.970

8.  Novel Role of FBXW7 Circular RNA in Repressing Glioma Tumorigenesis.

Authors:  Yibing Yang; Xinya Gao; Maolei Zhang; Sheng Yan; Chengjun Sun; Feizhe Xiao; Nunu Huang; Xuesong Yang; Kun Zhao; Huangkai Zhou; Suyun Huang; Bo Xie; Nu Zhang
Journal:  J Natl Cancer Inst       Date:  2018-03-01       Impact factor: 11.816

9.  A novel protein encoded by circFNDC3B inhibits tumor progression and EMT through regulating Snail in colon cancer.

Authors:  Zihao Pan; Jianye Cai; Jiatong Lin; Huinian Zhou; Jingwen Peng; Jinliang Liang; Long Xia; Qi Yin; Baojia Zou; Jun Zheng; Liang Qiao; Lei Zhang
Journal:  Mol Cancer       Date:  2020-04-02       Impact factor: 27.401

10.  Circular HER2 RNA positive triple negative breast cancer is sensitive to Pertuzumab.

Authors:  Jie Li; Maoguang Ma; Xuesong Yang; Maolei Zhang; Jingyan Luo; Huangkai Zhou; Nunu Huang; Feizhe Xiao; Bingquan Lai; Weiming Lv; Nu Zhang
Journal:  Mol Cancer       Date:  2020-09-11       Impact factor: 27.401

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.