Literature DB >> 34530446

circMine: a comprehensive database to integrate, analyze and visualize human disease-related circRNA transcriptome.

Wenliang Zhang^1,2,3, Yang Liu^2,4,5, Zhuochao Min⁶, Guodong Liang⁷, Jing Mo³, Zhen Ju^2,8,9, Binghui Zeng^10,3, Wen Guan^3,11, Yan Zhang², Jianliang Chen¹, Qianshen Zhang¹, Hanguang Li¹, Chunxia Zeng^2,8,9, Yanjie Wei^2,8,9, Godfrey Chi-Fung Chan^1,12.

Abstract

Many circRNA transcriptome data were deposited in public resources, but these data show great heterogeneity. Researchers without bioinformatics skills have difficulty in investigating these invaluable data or their own data. Here, we specifically designed circMine (http://hpcc.siat.ac.cn/circmine and http://www.biomedical-web.com/circmine/) that provides 1 821 448 entries formed by 136 871 circRNAs, 87 diseases and 120 circRNA transcriptome datasets of 1107 samples across 31 human body sites. circMine further provides 13 online analytical functions to comprehensively investigate these datasets to evaluate the clinical and biological significance of circRNA. To improve the data applicability, each dataset was standardized and annotated with relevant clinical information. All of the 13 analytic functions allow users to group samples based on their clinical data and assign different parameters for different analyses, and enable them to perform these analyses using their own circRNA transcriptomes. Moreover, three additional tools were developed in circMine to systematically discover the circRNA-miRNA interaction and circRNA translatability. For example, we systematically discovered five potential translatable circRNAs associated with prostate cancer progression using circMine. In summary, circMine provides user-friendly web interfaces to browse, search, analyze and download data freely, and submit new data for further integration, and it can be an important resource to discover significant circRNA in different diseases.

Entities: Chemical

Mesh：

Substances：
RNA, Circular

Year: 2022 PMID： 34530446 PMCID： PMC8728235 DOI： 10.1093/nar/gkab809

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Circular RNA (circRNA) is covalently closed endogenous RNA produced by back-splicing of precursor messenger RNA (mRNA) in eukaryotes, and they are largely discovered by deep sequencing techniques (1,2). Studies showed circRNAs play critical roles in important biological processes through acting as a microRNA (miRNA) sponge, competing endogenous RNA (ceRNA), protein regulator or even by translating themselves to produce peptides and proteins (1,2). They are emerging as a novel type of biomarker for human disease (1,2). Although studies on circRNA are accumulating rapidly in the last decade, our knowledge on their clinical and biological significance in human diseases remains elusive. Databases and computational tools have been developed for identifying and annotating the transcribed circRNA (3–6), tissue- and cell type-specific circRNA (7–10), circRNA interactor (such as miRNA and protein) (11,12), potential translatable circRNA (13–15) and experimentally validated disease associated circRNA (16–19). For example, the circBase (20), CircBank (21) and circAtlas (22) databases provide comprehensive genomics and functional annotations for circRNA. The CSCD (7) and CircRiC (8) databases were designed to annotate cancer-specific circRNAs. In addition, exoRBase (10) was developed to provide a landscape of human blood exosome circRNAs derived from RNA-seq data analyses. Recently, Li et al. has designed riboCIRC (13) to specifically host and investigate translatable circRNAs from ribosome sequencing data analyses. However, there is a lack of database to integrate human disease–related circRNA transcriptome datasets and provide comprehensive analyses to investigate these invaluable datasets online for identifying and discovering the clinical and biological significance of circRNA. Over the past 5 years, lots of human circRNA transcriptome datasets have been generated and deposited in Gene Expression Omnibus (GEO) (23,24). However, it is very challenging for researchers without bioinformatics skills to investigate these invaluable data or their own data, and it is also difficult to find the specific circRNA data with designated physiological and pathological conditions from the massive data in GEO. Moreover, the circRNA transcriptome data generated by different high-throughput platforms in GEO show great heterogeneity, which further hinders the application. Thus, there is an urgent need to design a specific database platform that can integrate and provide comprehensive analytical functions to investigate the circRNA transcriptome data, which can facilitate the study and understanding of circRNA in human disease. To overcome all these challenges, we specifically designed circMine (http://hpcc.siat.ac.cn/circmine and http://www.biomedical-web.com/circmine) to integrate human circRNA transcriptome data from GEO and developed comprehensive web applications to investigate these data (Figure 1). Currently, circMine provides 1 821 448 entries that formed by 136 871 circRNAs, 87 diseases and 120 circRNA transcriptome datasets of 1107 samples across 31 human body sites (Supplementary Table S1). To eliminate the data heterogeneity, each dataset was standardized and manually annotated with specific physiological and pathological conditions for accurate data retrieval (Figure 1A). Moreover, to discover and identify the significant circRNAs in human diseases, we developed 13 different analytical functions to investigate the integrated data individually (Figure 1B, upper left panel). The Web Server application on circMine further provides opportunities for researchers to conduct the 13 analyses on their own circRNA transcriptome data (Figure 1B, upper left panel).

Figure 1.

The scheme for data collection and manual curation (A) and the web application framework of circMine (B); GO, Gene Ontology; IRES, internal ribosome entry site element; KEGG, Kyoto Encyclopedia of Genes and Genomes; PCA, Principal Component Analysis. In addition, circMine provides three additional tools to discover and identify the biological significance of circRNA, including the circRNA-miRNA prediction, circRNA IRES prediction and ribo-circRNA location (Figure 1B, lower left panel). Moreover, circMine provides user-friendly web interfaces to browse, search, download data openly and submit new circRNA transcriptome data for further integration (Figure 1B, right panel). By using our circMine, we systematically identified five potential translatable circRNAs associated with prostate cancer progression through translating themselves and interacting with thousands of miRNAs. In summary, circMine can significantly improve our insight in discovering and identifying the significance of circRNA in human diseases.

MATERIALS AND METHODS

Data collection and processing

To collect the human circRNA transcriptome data, we retrieved the GEO resource (23,24) by searching keywords of ‘[(circRNA OR circular RNA) AND Homo sapiens]’. About 358 candidate datasets had been retrieved, which were made public before 10 April 2021. Moreover, all of these datasets were manually curated by at least two professional curators based on two criteria: (i) the datasets providing circRNA expression data of human samples (including tissue, plasma, exosome and cell line) were included regardless of the high-throughput platforms and (ii) the sample information and expression data of the datasets could be downloaded for further integration. After manual curation based on these criteria, 120 datasets were selected and downloaded, which showed great heterogeneity for they were generated from 21 different high-throughput platforms. Therefore, to overcome the data heterogeneity, we further developed a systematic pipeline to standardize and normalize the circRNA IDs, sample IDs, and junction read counts in the datasets by using the related annotation files in the circBase (20) and GEO (23,24) resources. About 82.50% (99/120), 28.33% (34/120) and 17.50% (21/120) of the datasets were standardized including circRNA ID, sample ID and junction read count to circBase ID, GEO accession and spliced reads per billion mapping (SRPBM), respectively. In addition, we assigned each dataset with a unique ID (e.g., HSACM000001) in circMine. We further manually annotated each dataset with specific physiological and pathological conditions for fast and accurate data retrieval, such as disease grade and stage, drug resistance, metastasis, virus infection, age and gender.

Differential expression module and co-expression module

To comprehensively investigate the integrated data in circMine through customized grouping and setting, we designed seven differential analytical functions in the differential expression module and six co-expression analytical functions in the co-expression module based on the R project (https://www.r-project.org/) (Figure 1B, upper left panel, Table 1 and Supplementary Figure S1). In addition, Supplementary Table S2 and the ‘Help’ web-page on the database have the details of these 13 analytical functions, including the major R packages and functions used for their implementation, data processing, analyzing and visualization.

Table 1.

The description of the 13 analytical functions in the differential expression and co-expression modules

Web analysis	Description
Differential expression module
General analysis	To conduct a heatmap plot, principal component analysis and box plot on the data
Differential expression	To identify all of significant differential circRNA between two conditions
Boxplot	To present the expression difference of a circRNA on different conditions as a box plot
Volcano plot	To depict the expression difference of one or more circRNAs between two conditions as a volcano plot
Heatmap plot	To depict the circRNA expression pattern on different conditions as a heatmap
GO enrichment	GO enrichment on the host genes of the differential circRNAs between two conditions
KEGG enrichment	KEGG enrichment on the host genes of the differential circRNAs between two conditions
Co- expression module
Co-expression	To present correlation analysis of a circRNA with all of circRNAs on specific conditions
Linear graph	To present the correlationship of two circRNAs on specific conditions
Boxplot	To conduct paired correlation analysis of two circRNAs on specific conditions
Corrplot	To depict the correlationships among multiple circRNAs on specific conditions
GO enrichment	GO enrichment on the host genes of the co-expression circRNAs on specific conditions
KEGG enrichment	KEGG enrichment on the host genes of the co-expression circRNAs on specific conditions

The description of the 13 analytical functions in the differential expression and co-expression modules

circRNA-miRNA prediction

Knowing that circRNA can act as a miRNA sponge and ceRNA to regulate many biological pathways in human diseases, we developed the circRNA-miRNA prediction tool to identify putative circRNA-miRNA interactions based on the miRanda (25), miRBase (26) and circBase (20) resources. We first extracted the human miRNA seed region sequences and circRNA sequences in the miRBase and circBase resources, respectively. After installing miRanda locally, a shell script has been designed to encapsulate miRanda with the extracted sequence data.

circRNA IRES prediction

Considering that the internal ribosome entry site elements (IRESs) at circRNA are able to drive its translation (2), we developed the circRNA IRES prediction to identify the experimentally validated human IRESs at circRNA based on the BLAST (27), IRESbase (28) and circBase (20) resources. First, we installed BLAST (ncbi-blast-2.11.0+-x64-linux), and then we extracted the experimentally validated human IRES sequences from IRESbase, and further constructed a human IRES indexes database for the BLAST. Moreover, we extracted human circRNA sequences in circBase. Finally, we wrote a shell script to encapsulate the BLAST (including the IRES indexes database) with the extracted circRNA sequences to implement this tool.

ribo-circRNA location tool

To identify ribosome-associated circRNA (ribo-circRNA) and predict the subcellular localization of the putative peptide and protein translated by it, we implemented the ribo-circRNA location tool based on riboCIRC (13), DeepLoc (29) and circBase (20). First, we extracted ribo-circRNAs and the amino acids sequences of their putative peptides and proteins from the riboCIRC database. Second, we annotated the ribo-circRNA with the circRNA ID in different resources such as circBase, CircBank and RefSeq. Third, we installed the DeepLoc (version 1.0) tool. The DeepLoc with default parameters was used to differentiate ten subcellular localizations of those putative peptides and proteins based on their amino acids sequences. Finally, we implemented the ribo-circRNA location tool in the R project and encapsulated it by using shell scripts.

circRNA name convert tool

To achieve the mutual conversion of the circRNA IDs of different resources, we first constructed an annotation file that annotates circRNAs with various IDs in different resources and reference genomes by using circBase (20), CircBank (21), riboCIRC (13), UCSC liftOver (30), and the annotation files of the GPL19978 and GPL21825 platforms in GEO (23,24). The GPL19978 and GPL21825 are two common platforms for human circRNA expression profile. Moreover, we implemented the circRNA name convert tool in the R project and encapsulated it with the constructed annotation file by shell scripts.

Web Server

Web Server is a systematic pipeline to automatically integrate and standardize the circRNA transcriptome data and its corresponding sample information uploaded by users, and further it assigns a temporary ID to the uploaded data so that they can perform analyses and remove their uploaded data in the database. The Web Server application was implemented in the R project based on the above constructed annotation file. The annotation file is used to convert the circRNA IDs in the uploaded data to the circBase IDs. Moreover, we designed the application so as to handle various types of circRNA expression value in the uploaded data, including junction read counts, transcripts per kilobase of exon model per million mapped reads (TPM), fragments per kilobase of exon model per million mapped fragments (FPKM), SRPBM and normalization value with or without log2 transformation. In addition, taking into account the data security, the application allows users to remove private data uploaded by themselves at any stage using the assigned temporary ID of the data, and we regularly clean up the uploaded data monthly.

Data storage and web implementation

circMine is freely available at the websites of http://hpcc.siat.ac.cn/circmine/ and http://www.biomedical-web.com/circmine. All of the annotation data are stored and managed in the MySQL database. The R version 4.0.3 (https://www.r-project.org/) was installed to run the analyses. In addition, Supplementary Tables S2 and S3 described the R packages and the resources used in the database implementation, respectively. Moreover, the database was implemented with a separated back-end and front-end web framework. The back-end was built with the web framework of Spring Boot (https://spring.io/projects/spring-boot/), while the front-end was built with Vue3 (https://vuejs.org/), JQuery (https://jquery.com/) and bootStrap4 (https://getbootstrap.com/). The programs for data processing and application operation were written in Java. Finally, the database was deployed in the Apache Tomcat Server.

RESULTS

A comprehensive resource for human circRNA transcriptome data with specific pathological and physiological conditions

circMine contains 1 821 448 entries that formed by 136 871 circRNAs, 87 diseases and 120 circRNA transcriptome datasets of 1107 samples across 31 human body sites (Supplementary Table S1). These diseases include various cancers, infection and immune inflammatory diseases, and diseases of heart, brain, digestive system, renal, spine, oral, bone, lung, vascular etc. and the samples include tissue (75.83%, 91/120), plasma (13.33%, 16/120), exosome (2.50%, 3/120) and cell line (8.34%, 10/120). To eliminate the heterogeneity of the datasets produced by different high-throughput platforms, circRNA ID, sample ID and expression value in the dataset have been standardized and normalized using circBase (20) and the annotation files in GEO (23,24). For facilitating fast and accurate retrieval, each dataset has been manually annotated with specific pathological and physiological conditions. The pathological and physiological conditions include disease grade and stage, genotype, drug resistance, metastasis, immune, lifestyle, virus infection, age and gender. The ‘Home’ web-page (http://hpcc.siat.ac.cn/circmine) provides a landscape of the data contents and features of the database. For example, the top six body sites with the largest number of datasets and samples are brain, heart, bone and bone marrow, blood and blood vessel, intestine and liver. Moreover, most of the datasets are associated with 37 cancer sub-types, which account for about 57.5% (69/120) of the datasets. The top three cancer sub-types with the largest number of datasets are gastric cancer, hepatocellular carcinoma and colorectal carcinoma, accounting for about 6.7% (8/120), 6.7% (8/120) and 4.2% (5/120), respectively. In addition, the statistic results on the ‘Home’ web-page showed that the number of circRNA transcriptome data has been accumulating rapidly in recent 5 years and 91.7% of the data are from China, while the remaining are from Germany, USA, Australia, Spain, Sweden, Japan and Netherlands. Furthermore, circMine provides user-friendly web interfaces to browse, search, access data openly, as well as to download and submit new data for further integration. For example, the search function in the download web-page enables users to quickly browse and download the specific data for their research. circMine also provides a variety of common data formats for users to download, including text table, CSV and JSON. In order to allow more flexibility, the ‘Submit’ application page has been developed to allow researchers to submit their new circRNA transcriptome data with its sample information to the database for future integration. After careful data evaluation (such as data quality and ethical approval) by our submission committee, the submitted data will be included in the future release.

Comprehensive analysis to investigate the integrated and researchers own human circRNA transcriptome data

To analyze the integrated disease-related circRNA transcriptome data individually for discovering the clinical and biological significance of circRNA, we implemented seven differential expression and six co-expression analytical functions in the differential expression and co-expression modules, respectively (Figure 1B, upper left panel and Table 1). All of these 13 analytical functions allow users to customize grouping samples based on their clinical metadata and setting parameters for individual analysis (Supplementary Figure S1). We enumerate the details of the 13 analytical functions in Table 1 and on the ‘Help’ web-page of the database. To better serve the community, we developed the Web Server application for users to upload their own circRNA transcriptome data, and further conduct the 13 different analyses on the uploaded data by themselves only. Given that the circRNA expression profiles from different researchers were generated from different platforms, the Web Server application has been designed to handle various types of circRNA IDs (such as circBase and CircBank IDs) and their expression value (including junction read counts, TPM, FPKM, SRPBM and normalization with or without log2 transformation) in the uploaded data. In addition, in order to better protect the security of the uploaded data, the Web Server allows users to delete the uploaded data using the assigned temporary ID at any stage, and we regularly clean up the uploaded data monthly. To enhance the application efficiency of circMine in the circRNA research community, circMine also provides three additional tools to study the biological mechanisms of circRNA in human disease, including circRNA-miRNA prediction, circRNA IRES prediction and ribo-circRNA location (Figure 1B, lower left panel). The detailed descriptions of these three tools are as follows: circRNA-miRNA prediction. This tool aims to discover the putative interaction between miRNA and circRNA. It allows users to enter a list of circBase IDs, the full length of interesting circRNAs sequences in FASTA format or upload a text file containing the circBase IDs or the full length of circRNAs sequences. Moreover, the score cutoff and energy cutoff parameters are provided to filter the significant interacting miRNA. circRNA IRES prediction. In order to discover the translatability of circRNA, the circRNA IRES prediction tool has been designed to identify the experimentally validated human IRES at circRNA. The circRNA IRES prediction tool enables users to enter a list of circBase IDs or the full length of interesting circRNAs sequences in FASTA format or upload a text file containing the circBase IDs or the full length of circRNA sequences. Moreover, the E-value parameter is provided to set a threshold for the significant IRES. ribo-circRNA location tool. This tool aims to identify ribo-circRNA and predict the subcellular localization of its putative peptide and protein. The tool can differentiate ten subcellular localizations such as nucleus, extracellular, cytoplasm, mitochondrial, cell membrane and endoplasmic reticulum. In addition, the tool allows users to enter a list of circRNAs with various nomenclatures such as circBase IDs, CircBank IDs, RefSeq IDs and the chromosome positions of hg19 and hg38. circMine also provides a tool called circRNA name convert to automatically annotate circRNAs among different resources and reference genomes, including circBase, CircBank, riboCIRC, RefSeq, GenBank, hg19 and hg38, and the common platforms of GPL19978 and GPL21825 in GEO. Given that circRNA coordinates probably caused by 0-based/1-based errors of coordinate systems or potential sequencing errors, the circRNA name convert and ribo-circRNA location tools are designed to allow 2 bp mismatch for the circRNA coordinate conversion. Finally, the results from the above analyses and tools are presented as graph or table. The results in graph format can be freely downloaded and saved as a PDF file in high resolution, while the result tables offer filtering and sorting functions that allow users to easily search for the interesting data (Supplementary Figure S2). A menu at the table header allows users to set a table with columns and rows to show the interesting data (Supplementary Figure S2). Moreover, by right clicking on the table body, an instruction form will be shown to copy and export the table results for further analyses (Supplementary Figure S2). In addition, the resulted tables provide links to the related resources for the detailed information about the circRNA, ribo-circRNA, interacting miRNA and human IRES. A comprehensive tutorial is available for these applications on the database ‘Help’ web-page.

Case study: comprehensive analysis to discover potential translatable circRNAs associated with prostate cancer progression using circMine

To identify key circRNA associated with prostate cancer progression, we comprehensively investigated the circRNA transcriptome dataset (circMine ID: HSACM000016) (31) by using circMine. First, we classified the samples into two groups named low (Gleason < 6) and high (Gleason > 8) based on the sample information of the dataset (Supplementary Figure S1). We further performed the seven analyses in the differential expression module. The results from the general analysis showed significantly different circRNA expression profiles between the two groups (Figure 2A and Supplementary Figure S3A,B). Moreover, from the differential expression analysis, 2052 deregulated circRNAs were identified (|logFC| ≥ 1.0 and P-value ≤0.05), including 1197 up-regulated and 855 down-regulated (Figure 2B). The GO and KEGG enrichment results (Figure 2C and Supplementary Figure S3C–F) suggested that these up-regulated and down-regulated circRNAs were enriched on the critical biological functions and pathways associated with prostate cancer progression (32–35). These biological functions include protein modification, cell adhesion, actin cytoskeleton, proteoglycans, GTPase activity, ErbB signaling, endocytosis, thyroid hormone signaling, and RNA and DNA processing (Figure 2C and Supplementary Figure S3C–F). Consistent with recent studies (29,36–39), our pilot test-run results supported that circRNA plays critical roles in regulating prostate cancer progression.

Figure 2.

The circRNA differential expression analysis results from five low (Gleason < 6) and five high (Gleason > 8) grade prostate cancer tissues. (A) The heatmap plot shows significantly different circRNA expression profile between the two groups. (B) The pie chart indicates the numbers and percentages of deregulated, up-regulated and down-regulated circRNAs. (C) The bar and dot plot to show the GO and KEGG enrichment results from the 2052 deregulated circRNAs (|logFC| ≥ 1.0 and P-value ≤ 0.05), respectively. We further identified 190 ribo-circRNAs from the 2052 deregulated circRNAs, and 263 putative peptides and proteins translated by these ribo-circRNAs used the ribo-circRNA location tool (Supplementary Table S4). The results from the ribo-circRNA location tool showed that the 263 peptides and proteins have variable subcellular localizations, including mitochondrion (28.53%, 75/263), cytoplasm (22.05%, 58/263), nucleus (22.05%, 58/263), extracellular (17.49%, 46/263), endoplasmic_reticulum (4.56%, 12/263), plastid (2.28%, 6/263), lysosome/vacuole (1.14%, 3/263), peroxisome (0.76%, 2/263), golgi apparatus (0.76%, 2/263) and cell membrane (0.38%, 1/263) (Figure 3A). Furthermore, the heatmap results suggested that the 190 ribo-circRNAs can significantly distinguish different grades of prostate cancer regardless of the subcellular localization of their putative peptides and proteins (Figure 3B).

Figure 3.

Deregulated ribo-circRNA significantly distinguished different grades of prostate cancer regardless of the subcellular localization of their putative peptides and proteins. (A) The distribution of subcellular localization of the 263 putative peptides and proteins translated by the 190 deregulated ribo-circRNAs. (B) The heatmaps show that the 190 ribo-circRNAs significantly distinguish different grade prostate cancer regardless of the subcellular localization of their putative peptides and proteins. Next, circRNA IRES prediction tool was used to identify 79 circRNAs that have at least one experimentally validated human IRES (E-value ≤ 1E-5) from the 2052 deregulated circRNAs (Figure 4A and Supplementary Table S5). Moreover, from the 79 circRNAs, we identified five ribo-circRNAs associated with prostate cancer progression with best translation potential, including hsa_circ_0003700, hsa_circ_0003458, hsa_circ_0001112, hsa_circ_0008351 and hsa_circ_0003643 (Figure 4A and B; Supplementary Figure S4A and Supplementary Table S6). In addition, the boxplot and corrplot analysis in the co-expression module showed that they are significantly correlated with each other (Figure 4C and Supplementary Figure S4B–K). The circRNA-miRNA prediction discovered 1532 miRNAs (score ≥ 140 and energy ≤ -7.0) that may be able to interact with the five circRNAs (Supplementary Table S7). Finally, the function enrichment results from the co-expression circRNAs of the five circRNAs are consistent with those enrichment results of the 2052 deregulated circRNAs, suggesting that these five circRNAs may play a vital role in regulating prostate cancer progression through translating themselves and interacting with the miRNAs, but future validation in vitro and in vivo are needed (Figures 2C and 4D–H). In summary, the case study demonstrated that circMine can serve as an important resource to improve our insight in understanding the significance of circRNA in human diseases.

Figure 4.

Comprehensive investigation to discover potential translatable circRNAs and identify their biological and clinical significance in prostate cancer progression. (A) Of the 2052 deregulated circRNAs, 190 circRNAs are ribo-circRNA, 79 circRNAs have human IRES(s) (E-value ≤ 1E-5), and five circRNAs are ribo-circRNA and have experimentally validated human IRES(s), including hsa_circ_0003700, hsa_circ_0003458, hsa_circ_0001112, hsa_circ_0008351 and hsa_circ_0003643. (B) The volcano plot shows that the five circRNAs are significantly different between low- and high- grade prostate cancers. (C) The corrplot diagram shows that the expression patterns of the five circRNAs are significantly correlated with each other. *: P-value ≤ 0.05, **: P-value ≤ 0.01 and ***: P-value ≤ 0.001. (D–H) The GO and KEGG enrichment of the five circRNAs based the annotations of the host genes of their co-expression circRNAs (|Correlation| ≥ 0.8 and P-value ≤ 0.05).

DISCUSSION

circMine is the first comprehensive database designed to systematically integrate, analyze and visualize human circRNA transcriptome data on specific physiological and pathological conditions. Compared with other circRNA databases (7–11,13–15,21,22,40,41), which only provide genomics, expression patterns, and functional and structure annotations for common circRNA and tissue specific circRNA, circMine can provide unique data and significant function to fill some of the service gaps. For example, in contrary to the disease-related databases such as CircR2Disease (16) and circRNADisease (17) that provide a limited amount of experimentally validated circRNA-disease association data through manual curation on publication, circMine provides 1 821 448 entries of 136 871 circRNAs, 87 diseases and 120 circRNA transcriptome datasets (Supplementary Table S1). Moreover, circMine provides 13 different analytical functions with customized grouping and setting features to analyze and visualize these integrated circRNA transcriptome data for identifying the circRNA biomarkers in human diseases (Table 1, Figure 1B, upper left panel and Supplementary Table S1). And both the disease-related circRNA transcriptome datasets and 13 analytical functions are not provided in the existing circRNA databases such as circAtlas (22), circRNADb (15), Circbank (21), CircR2Disease (16) and circRNADisease (17) (Supplementary Table S1). Furthermore, circMine provides the Web Server tool and allows researchers to study their own circRNA transcriptome datasets by using the aforementioned 13 analytical functions, while only circAtlas (22) supports two of those functions without customized grouping and setting features (Table 1, Figure 1B, upper left panel and Supplementary Table S1). Although circAtlas (22), circRNADb (15) and Circbank (21) provide biological annotations (such as interaction and translatability) of circRNA, circMine also provides three additional tools to predict biological functions of circRNA to enhance the application efficiency of circMine in the circRNA research community, including circRNA-miRNA prediction, circRNA IRES prediction and ribo-circRNA location (Figure 1B, lower left panel and Supplementary Table S1). The ribo-circRNA location function in circMine is not provided in those available databases (Supplementary Table S1). Different from the online tools such as CIRCexplorer (6) that is for the upstream analysis on the circRNA sequencing data to identify and annotate circRNA and quantify its expression, circMine aims for the downstream analysis on the disease-related circRNA transcriptome data to identify the clinical and biological significance of circRNA in different human diseases (Figure 1 and Supplementary Table S1). Thus, circMine is significantly different from all the existing circRNA databases and tools, and it provides a new data and function platform that is not currently available. As a test-run pilot study, circMine systematically identified five circRNAs associated with prostate cancer progression. The results suggest that they may modify several critical biological functions and pathways through translating themselves and interacting with miRNAs, but it needs to be further validated in vitro and in vivo. Thus, as a unique resource, circMine can serve as an invaluable tool for bench and computational researchers to conduct in-depth investigation about the role of circRNA in human diseases. In the coming future, more and more human circRNA transcriptome data are expected to be generated by different high-throughput platforms. To better serve the research community, circMine will continue to enrich its data resource and analytical power by integrating both public and newly shared private data. It will provide new functionalities for analyzing and visualizing circRNA data periodically at every 6 months. More data features can also be included through integrating the circRNA data related to genomics, proteomics, epigenetics and even the circRNA data of other organisms from the public resources, which include Catalogue of Somatic Mutations in Cancer (42), GWAS Catalog (43), The Cancer Genome Atlas (44), International Cancer Genome Consortium (45) and GEO (23,24). For example, we plan to integrate datasets that include both the circRNA expression and linear RNA expression data, and further add the co-expression function of circRNA-linear RNA in the co-expression module when there is enough accessible datasets available in the public resource. We believe that all these additional data and functionalities will enhance the application efficiency of circMine in the circRNA research community.

DATA AVAILABILITY

The circMine database platform is accessible at websites of http://hpcc.siat.ac.cn/circmine and http://www.biomedical-web.com/circmine. Click here for additional data file.

45 in total

1. CircInteractome: A web tool for exploring circular RNAs and their interacting proteins and microRNAs.

Authors: Dawood B Dudekula; Amaresh C Panda; Ioannis Grammatikakis; Supriyo De; Kotb Abdelmohsen; Myriam Gorospe
Journal: RNA Biol Date: 2016 Impact factor: 4.652

2. NCBI GEO: archive for functional genomics data sets--update.

Authors: Tanya Barrett; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Michelle Holko; Andrey Yefanov; Hyeseung Lee; Naigong Zhang; Cynthia L Robertson; Nadezhda Serova; Sean Davis; Alexandra Soboleva
Journal: Nucleic Acids Res Date: 2012-11-27 Impact factor: 16.971

3. CSCD: a database for cancer-specific circular RNAs.

Authors: Siyu Xia; Jing Feng; Ke Chen; Yanbing Ma; Jing Gong; Fangfang Cai; Yuxuan Jin; Yang Gao; Linjian Xia; Hong Chang; Lei Wei; Leng Han; Chunjiang He
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

4. circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations.

Authors: Xiaoping Chen; Ping Han; Tao Zhou; Xuejiang Guo; Xiaofeng Song; Yan Li
Journal: Sci Rep Date: 2016-10-11 Impact factor: 4.379

5. Comprehensive characterization of circular RNAs in ~ 1000 human cancer cell lines.

Authors: Hang Ruan; Yu Xiang; Junsuk Ko; Shengli Li; Ying Jing; Xiaoyu Zhu; Youqiong Ye; Zhao Zhang; Tingting Mills; Jing Feng; Chun-Jie Liu; Ji Jing; Jin Cao; Bingying Zhou; Li Wang; Yubin Zhou; Chunru Lin; An-Yuan Guo; Xi Chen; Lixia Diao; Wenbo Li; Zhiao Chen; Xianghuo He; Gordon B Mills; Michael R Blackburn; Leng Han
Journal: Genome Med Date: 2019-08-26 Impact factor: 11.117

6. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Authors: Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

7. miRBase: from microRNA sequences to function.

Authors: Ana Kozomara; Maria Birgaoanu; Sam Griffiths-Jones
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

8. BLAST: a more efficient report with usability improvements.

Authors: Grzegorz M Boratyn; Christiam Camacho; Peter S Cooper; George Coulouris; Amelia Fong; Ning Ma; Thomas L Madden; Wayne T Matten; Scott D McGinnis; Yuri Merezhuk; Yan Raytselis; Eric W Sayers; Tao Tao; Jian Ye; Irena Zaretskaya
Journal: Nucleic Acids Res Date: 2013-04-22 Impact factor: 16.971

9. IRESbase: A Comprehensive Database of Experimentally Validated Internal Ribosome Entry Sites.

Authors: Jian Zhao; Yan Li; Cong Wang; Haotian Zhang; Hao Zhang; Bin Jiang; Xuejiang Guo; Xiaofeng Song
Journal: Genomics Proteomics Bioinformatics Date: 2020-06-06 Impact factor: 7.691

10. COSMIC: the Catalogue Of Somatic Mutations In Cancer.

Authors: John G Tate; Sally Bamford; Harry C Jubb; Zbyslaw Sondka; David M Beare; Nidhi Bindal; Harry Boutselakis; Charlotte G Cole; Celestino Creatore; Elisabeth Dawson; Peter Fish; Bhavana Harsha; Charlie Hathaway; Steve C Jupe; Chai Yin Kok; Kate Noble; Laura Ponting; Christopher C Ramshaw; Claire E Rye; Helen E Speedy; Ray Stefancsik; Sam L Thompson; Shicai Wang; Sari Ward; Peter J Campbell; Simon A Forbes
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

6 in total

Review 1. Circular RNA and Its Roles in the Occurrence, Development, Diagnosis of Cancer.

Authors: Yue Zhang; Xinyi Zhang; Yumeng Xu; Shikun Fang; Ying Ji; Ling Lu; Wenrong Xu; Hui Qian; Zhao Feng Liang
Journal: Front Oncol Date: 2022-04-07 Impact factor: 5.738

Review 2. Non-Coding RNAs in Tuberculosis Epidemiology: Platforms and Approaches for Investigating the Genome's Dark Matter.

Authors: Ahmad Almatroudi
Journal: Int J Mol Sci Date: 2022-04-17 Impact factor: 6.208

3. Competing endogenous RNA network mediated by circ_3205 in SARS-CoV-2 infected cells.

Authors: Cinzia Di Pietro; Guido Scalia; Michele Purrello; Davide Barbagallo; Concetta Ilenia Palermo; Cristina Barbagallo; Rosalia Battaglia; Angela Caponnetto; Vittoria Spina; Marco Ragusa
Journal: Cell Mol Life Sci Date: 2022-01-17 Impact factor: 9.261

4. Circular RNA circ0007360 Attenuates Gastric Cancer Progression by Altering the miR-762/IRF7 Axis.

Authors: Yawei Xing; Hongxia Chen; Zixiang Guo; Xiaodong Zhou
Journal: Front Cell Dev Biol Date: 2022-02-17

5. The potential of mecciRNA in hepatic stellate cell to regulate progression of nonalcoholic hepatitis.

Authors: Boqiang Liu; Yuanshi Tian; Jing He; Qiuxia Gu; Binghan Jin; Hao Shen; Weiqi Li; Liang Shi; Hong Yu; Ge Shan; Xiujun Cai
Journal: J Transl Med Date: 2022-09-04 Impact factor: 8.440

Review 6. A Survey on Computational Methods for Investigation on ncRNA-Disease Association through the Mode of Action Perspective.

Authors: Dongmin Bang; Jeonghyeon Gu; Joonhyeong Park; Dabin Jeong; Bonil Koo; Jungseob Yi; Jihye Shin; Inuk Jung; Sun Kim; Sunho Lee
Journal: Int J Mol Sci Date: 2022-09-29 Impact factor: 6.208

6 in total