Literature DB >> 35647026

ceRNAshiny: An Interactive R/Shiny App for Identification and Analysis of ceRNA Regulation.

Yueqiang Song¹, Jia Li¹, Yiming Mao², Xi Zhang^1,3.

Abstract

The competing endogenous RNA (ceRNA) network is a newly discovered post-transcriptional regulation that controls both physiological and pathological progresses. Increasing research studies have been pivoted on this theory to explore the function of novel non-coding RNAs, pseudogenes, circular RNAs, and messenger RNAs. Although there are several R packages or computational tools to analyze ceRNA networks, an urgent need for easy-to-use computational tools still remains to identify ceRNA regulation. Besides, the conventional tools were mainly devoted to investigating ceRNAs in malignancies instead of those in neurodegenerative diseases. To fill this gap, we developed ceRNAshiny, an interactive R/Shiny application, which integrates widely used computational methods and databases to provide and visualize the construction and analysis of the ceRNA network, including differential gene analysis and functional annotation. In addition, demo data in ceRNAshiny could provide ceRNA network analyses about neurodegenerative diseases such as Parkinson's disease. Overall, ceRNAshiny is a user-friendly application that benefits all researchers, especially those who lack an established bioinformatic pipeline and are interested in studying ceRNA networks.

Entities: Chemical

Keywords: R; Shiny; ceRNA; lncRNA; miRNA

Year: 2022 PMID： 35647026 PMCID： PMC9136144 DOI： 10.3389/fmolb.2022.865408

Source DB: PubMed Journal: Front Mol Biosci ISSN： 2296-889X

Introduction

The vast majority of the human genome is non-coding sequences whose transcripts without protein-coding capacity are named non-coding RNAs (ncRNAs) (Alexander et al., 2010). ncRNAs are now regarded as core regulators involved in gene transcription, epigenetic regulation, and post-transcriptional regulation which exert their effects on the occurrence, development, and diagnosis of diverse diseases (Esteller 2011). A hypothesis of how ncRNAs work has been proposed and gradually confirmed, which was named the competing endogenous RNAs (ceRNAs) network (Salmena et al., 2011). According to this theory, long non-coding RNAs (lncRNAs), pseudogenes, and circular RNAs (circRNAs) act as microRNA (miRNA) sponges via miRNA response elements (MREs) or messenger RNA (mRNA) binding sites to control the availability of endogenous miRNAs for binding to their target mRNAs, which can form a ceRNA network to modulate mRNA expression and regulate protein levels. These complex networks may provide multiple clues for unraveling the pathogenesis in diseases (Tay et al., 2014). Notably, as a representative of various types of ceRNAs, lncRNA-associated ceRNA networks might be eligible candidates as promising therapeutic targets. Due to the huge scale of ceRNA networks, the availability of computational methods has allowed theoretical construction of ceRNA networks which provides convincing evidence for further verification in vitro or in vivo (Yang et al., 2018). The development of different ceRNA-directed computational methods can be mainly categorized into two classes: 1) methods constructed by combining expression profiles and statistic indexes, such as the Pearson correlation coefficient (PCC) and mutual information (MI), sensitivity correlation (SI), multiple sensitivity correlation, conditional mutual information (CMI), intervention calculus when the DAG is absent (IDA), and liquid association (LA) (Lloyd 2000), and 2) mathematical models, such as the minimal model, stochastic model, mass-action model, coarse-grained model, and coarse-grained competition motif model (Le et al., 2017; Zhang et al., 2017, 2022). Additionally, a set of lncRNA–miRNA–mRNA pairs databases have been established (Li et al., 2014; Le et al., 2017), such as lnCeDB (Das et al., 2014), LncCeRBase (Pan et al., 2019), miRSponge (Wang et al., 2015), LncACTdb (Wang et al., 2022), ceRDB (Sarver Subramanian 2012), starBase (Li et al., 2014), HumanViCe (Ghosal et al., 2014), PceRBase (Yuan et al., 2017), Tarbase (Vergoulis et al., 2012), miRTarbase (Huang et al., 2020), miRecords (Xiao et al., 2009), miRWalk (Sticht et al., 2018), TargetScan (www.targetscan.org), miRanda (Betel et al., 2008), MicroCosm (Griffiths-Jones et al., 2008), PicTar (http://www.pictar.org/), DIANA-microT (Vlachos et al., 2015), PITA (http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html), and CLASH (Helwak Tollervey 2014). Some databases can function as not only a prediction tool to guide experiments but also a free hub which provides experimentally validated results. Current computational methods are, respectively, biased in terms of the accuracy or sensitivity of prediction. The contents of different databases also vary with species, diseases, and tissues. At present, scholars have developed many R packages for constructing ceRNA networks, but all of them require proficiency in using R software (Li et al., 2018; Zhang et al., 2018, 2019, 2021; Wen et al., 2020). Meanwhile, the newly developed CeNet Omnibus, an R/Shiny-based application, is used to predict ceRNA network construction using different computational methods (Wen et al., 2021) with database information uploaded manually, which also analyzes the distribution of topological properties of networks. Then, ceRNAshiny, which we developed, has the advantage of being more convenient and reliable for students and researchers who majored in biology and medicine, especially people with limited experience in programing. Analysis processes in ceRNAshiny for sequencing data are currently recognized and meet basic requirements of users for sequencing data analysis. At the same time, ceRNAshiny can effectively classify large sequencing data by RNA types, which is important and convenient for distinguishing sequencing data containing various RNA types. In addition, ceRNAshiny provides predictions based on expression, sequence, and both. In terms of built-in databases, ceRNAshiny includes predictive and experimentally validated databases that, respectively, contain the human-sourced contents of multiple databases. Based on these two types of databases, users can obtain multiple ceRNA networks more easily, providing multiple options for subsequent data validation and wet experiments. Overall, ceRNAshiny could be a useful tool for people without enough time and a knowledge background to rapidly get results. With the support of the R/Shiny framework, ceRNAshiny offers a web-based user-friendly interface for users to obtain the identification, analysis, and visualization of ceRNA regulation, such as differential gene analysis and functional annotation.

Implementation

ceRNAshiny is an R-based Shiny application constructed using various R packages, including reshape2 (https://rdocumentation.org/packages/reshape2/versions/1.4.3), igraph (Csardi Nepusz 2006), edgeR (Robinson et al., 2010), DESeq2 (Michael I Love and Anders, 2014), limma (Brophy et al., 1987; Ritchie et al., 2015), glmnet (Simon et al., 2011; Engebretsen and Bohlin 2019), yulab. utils (https://CRAN.R-project.org/package=yulab.utils), ggplot2 (Wickham et a.l, 2016), rvcheck (https://github.com/GuangchuangYu/rvcheck), shiny (https://github.com/rstudio/shiny/issues), shinythemes (https://rstudio.github.io/shinythemes/), DT (https://github.com/rstudio/DT), pheatmap (https://CRAN.R-project.org/package=pheatmap), ReactomeRA (Yu and He 2016), and clusterProfiler (Yu et al., 2012). The Shiny R platform was deployed on the webserver to host the web application of ceRNAshiny. The human-sourced lncRNA–miRNA–mRNA pairing information included in ceRNAshiny was obtained from various databases (John et al., 2004; Lewis et al., 2005), which could fall into two categories: predicted databases, like starBase 3.0 (starBase 3.0 was also named as ENCORI; Li et al., 2014), miRWalk (Sticht et al., 2018), TargetScan (www.targetscan.org), and miRanda (Betel et al., 2008), and experimentally validated databases, like Tarbase (Vergoulis et al., 2012) and miRTarbase (Huang et al., 2020). To show the functionality and usability of ceRNAshiny, we used the publicly available array datasets (GSE7621) that were generated from the substantia nigra from the postmortem human brain of Parkinson’s disease (PD) patients and control ones (Lesnick et al., 2007). Furthermore, GSE136666, which contains transcriptomic results of human substantia nigra and putamen samples from PD patients and age-matched controls, was chosen as template data of high-throughput RNA sequencing data (Xicoy et al., 2020). The app can be accessed here: https://cerna.shinyapps.io/cerna_shiny/. The source code and related documents can be obtained through GitHub: https://github.com/yqsongGitHub/ceRNA_shiny.

Data Input

The ceRNAshiny app is more suitable for users who are interested in ceRNA but are limited by programing. For an individual user, the only step is to input the expression matrix, group list, and annotation platform information (optional) that are consistent with the format of template data (could be downloaded). Both array data and high-throughput RNA sequencing data can be acceptable and processible as the input expression matrix. If the input expression matrix is array data, it is necessary to synchronously provide annotation platform information, with probe names as row names of the matrix and sample lists as column names. In case the input expression matrix comprised high-throughput RNA sequencing data, Ensembl numbers should be provided as row names and sample lists as column names.

Data Processing and Analysis

For data processing, the following parameters are performed, including missing value imputation, log2 transformation, background adjustment, and quantile normalization. Cluster analysis results are presented in heat maps to allow users to remove low-quality samples. Depending on the category of input data, differential gene analysis, annotation analysis, and enrichment analysis can be performed using the corresponding R packages.

Network Construction

Based on differentially expressed genes (DEGs), PCC, sensitivity partial Pearson correlation (SPPC), and the partial Pearson correlation (PC) algorithm are supposed to be employed for predicting potential lncRNA–miRNA–mRNA pairs, which can identify the maximum number of miRNA sponge interactions (Zhang et al., 2019). In addition, we downloaded and compared the relevant information in starBase 3.0 (Li et al., 2014), miRWalk (Sticht et al., 2018), TargetScan (www.targetscan.org), and miRanda (Betel et al., 2008) databases and aggregated miRNA–lncRNA pairs (63,556 pairs) and miRNA–mRNA pairs (1,441,765 pairs) recorded in these databases as the predicted database. For the experimentally validated database, we aggregated the contents of the Tarbase database (Vergoulis et al., 2012) and miRTarbase database (Huang et al., 2020), which finally contained miRNA–lncRNA pairs (1,506 pairs) and miRNA–mRNA pairs (652,703 pairs). Thus, we predicted ceRNA networks using three approaches. In the first approach, we analyzed statistical relationships of genes based on gene expression and different algorithms (PCC, SPPC, and PC) to identify potential lncRNA–miRNA–mRNA pairs. In the second approach, we compared uploaded data with the predicted database and the experimentally verified database, respectively, to get the potential lncRNA–miRNA–mRNA pairs based on the sequence. Then in the last approach, we intersected the results of the previous two steps to obtain lncRNA–miRNA–mRNA pairs that satisfied both requirements. Then the intersection of the aforementioned results would be output as more credible clues for subsequent experiments. Finally, enrichment analysis is going to be performed on the aforementioned ceRNA networks for biological functions and pathways.

Results

Case Study: GSE7621 Expression Data of Substantia Nigra from a Postmortem Human Brain of Parkinson’s Disease

For demonstration, we used the expression profiling dataset GSE7621 from the GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array platform (54317 probes per sample) to identify ceRNA networks in 25 cases with PD (Lesnick et al., 2007). In addition, the dataset GSE13666 from GPL11154 Illumina HiSeq 2000 (Homo sapiens) was chosen as the template data of high-throughput RNA sequencing data (Xicoy et al., 2020). Due to the similar process, the analysis process of GSE7621 is selected as the example. After uploading the expression profiling data into the input table panel of ceRNAshiny (Figure 1A), data can be imported conveniently for further analyses. Through easily clicking the buttons from the table panel, users can perform the following editing: “Heatmap plot,” “Volcano plot,” “Enrichment,” “RNA classification,” “Expression-based ceRNA prediction,” “Sequence-based ceRNA prediction,” and “ceRNA prediction based on expression and sequence” (Figure 1A). The analyses indicated by these buttons are based on differential gene analysis. Sliders of the p value and fold change value are set up for users to adjust the p value and fold change value independently and to gain corresponding plots (volcano plot, heatmap plot, enrichment plots, etc.) and results (Figure 2). Moreover, ceRNAshiny can help users effectively distinguish lncRNAs, miRNAs, and mRNAs among multitudinous genes in the expression profiles (Figure 3A). It is worth noting that the correct type of uploaded data should be selected in the panel to avoid errors during these analyses (Figure 3B).

FIGURE 1

Main interface of ceRNAshiny. (A) The main interface of ceRNAshiny for introduction and analysis.

FIGURE 2

Using ceRNAshiny to generate differentially expressed genes and enrichment analysis. (A) Panels for parameter configuration. (B) The generated heatmap plot using the template dataset. (C) The generated volcano plot using the template dataset. (D) The generated enrichment analysis using the template dataset.

FIGURE 3

Using ceRNAshiny to classify RNAs. (A) Panels to show the type of differential genes and to adjust parameters. (B) The generated results of RNA classification using the template dataset.

Main interface of ceRNAshiny. (A) The main interface of ceRNAshiny for introduction and analysis. Using ceRNAshiny to generate differentially expressed genes and enrichment analysis. (A) Panels for parameter configuration. (B) The generated heatmap plot using the template dataset. (C) The generated volcano plot using the template dataset. (D) The generated enrichment analysis using the template dataset. Using ceRNAshiny to classify RNAs. (A) Panels to show the type of differential genes and to adjust parameters. (B) The generated results of RNA classification using the template dataset. These three buttons in the interface, “Expression-based ceRNA prediction,” “Sequence-based ceRNA prediction,” and “ceRNA prediction based on expression and sequence,” are designed for identifying ceRNA networks. Users can select the “Expression-based ceRNA prediction” module to construct and, respectively, download potential lncRNA–miRNA–mRNA pairs, whose gene expression conformed to statistical relationships based on different algorithms (PCC, SPPC, and PC) (Figures 4A–C). In addition, the “Sequence-based ceRNA prediction” module allows users to match genes in uploaded data with predicted and expression validated database resources to find potential lncRNA–miRNA–mRNA pairs, based on sequence (Figures 5A, B). Finally, through the “ceRNA prediction based on expression and sequence” module, users can create other ceRNA networks, in which lncRNA–miRNA–mRNA pairs simultaneously conform to both the PCC algorithm and sequence prediction (Figures 6A, B). The analysis is based on default parameters, which can be adjusted in accordance to requirements.

FIGURE 4

FIGURE 5

Using ceRNAshiny to generate “Sequence-based ceRNA prediction.” (A) Panels to adjust parameters. (B) The generated database prediction and its enrichment results based on databases using the template dataset.

FIGURE 6

Using ceRNAshiny to generate “ceRNA prediction based on expression and sequence.” (A) Panels to adjust parameters. (B) The generated ceRNA network and its enrichment results based on PCC arithmetic and database sources using the template dataset.

Using ceRNAshiny to generate “Expression-based ceRNA prediction.” (A) Panels to show the type of arithmetic and adjust parameters. (B) The generated algorithm prediction and its enrichment results based on PCC, PC, and SPPC arithmetic using the template dataset. Using ceRNAshiny to generate “Sequence-based ceRNA prediction.” (A) Panels to adjust parameters. (B) The generated database prediction and its enrichment results based on databases using the template dataset. Using ceRNAshiny to generate “ceRNA prediction based on expression and sequence.” (A) Panels to adjust parameters. (B) The generated ceRNA network and its enrichment results based on PCC arithmetic and database sources using the template dataset.

Conclusion and Outlook

ceRNAshiny is designed with a clear purpose to identify and visualize ceRNA networks for users with limited experience in programing. ceRNAshiny was developed for customizable generation of volcano plots, heatmap plots, ceRNA graphs, and other results using input expression datasets. Unlike conventional tools, ceRNAshiny not only identified the ceRNA networks using computational methods but also matched lncRNA–miRNA–mRNA pairs from multiple human source databases, eliminating the need for users to use online databases. With the basic R environment and an internet connection, this user-friendly Shiny application will be automatically set up and intuitively applied to visually review the reported results. We provide a downloadable source code and an offline version for researchers who are good at programming. Since the Shiny package was designed to build interactive web applications, it is straightforward to deploy ceRNAshiny on servers to provide an online service so that it can be used by researchers from a variety of backgrounds with ranging interests.

37 in total

Review 1. Annotating non-coding regions of the genome.

Authors: Roger P Alexander; Gang Fang; Joel Rozowsky; Michael Snyder; Mark B Gerstein
Journal: Nat Rev Genet Date: 2010-07-13 Impact factor: 53.242

2. limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971

3. Competing endogenous RNA database.

Authors: Aaron L Sarver; Subbaya Subramanian
Journal: Bioinformation Date: 2012-08-03

4. Human MicroRNA targets.

Authors: Bino John; Anton J Enright; Alexei Aravin; Thomas Tuschl; Chris Sander; Debora S Marks
Journal: PLoS Biol Date: 2004-10-05 Impact factor: 8.029

5. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors: Michael I Love; Wolfgang Huber; Simon Anders
Journal: Genome Biol Date: 2014 Impact factor: 13.583

6. PceRBase: a database of plant competing endogenous RNA.

Authors: Chunhui Yuan; Xianwen Meng; Xue Li; Nicola Illing; Robert A Ingle; Jingjing Wang; Ming Chen
Journal: Nucleic Acids Res Date: 2016-10-07 Impact factor: 16.971

7. miRTarBase 2020: updates to the experimentally validated microRNA-target interaction database.

Authors: Hsi-Yuan Huang; Yang-Chi-Dung Lin; Jing Li; Kai-Yao Huang; Sirjana Shrestha; Hsiao-Chin Hong; Yun Tang; Yi-Gang Chen; Chen-Nan Jin; Yuan Yu; Jia-Tong Xu; Yue-Ming Li; Xiao-Xuan Cai; Zhen-Yu Zhou; Xiao-Hang Chen; Yuan-Yuan Pei; Liang Hu; Jin-Jiang Su; Shi-Dong Cui; Fei Wang; Yue-Yang Xie; Si-Yuan Ding; Meng-Fan Luo; Chih-Hung Chou; Nai-Wen Chang; Kai-Wen Chen; Yu-Hsiang Cheng; Xin-Hong Wan; Wen-Lian Hsu; Tzong-Yi Lee; Feng-Xiang Wei; Hsien-Da Huang
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

8. CeNet Omnibus: an R/Shiny application to the construction and analysis of competing endogenous RNA network.

Authors: Xiao Wen; Lin Gao; Tuo Song; Chaoqun Jiang
Journal: BMC Bioinformatics Date: 2021-02-18 Impact factor: 3.169

9. LncACTdb 3.0: an updated database of experimentally supported ceRNA interactions and personalized networks contributing to precision medicine.

Authors: Peng Wang; Qiuyan Guo; Yue Qi; Yangyang Hao; Yue Gao; Hui Zhi; Yuanfu Zhang; Yue Sun; Yakun Zhang; Mengyu Xin; Yunpeng Zhang; Shangwei Ning; Xia Li
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

10. The microRNA.org resource: targets and expression.

Authors: Doron Betel; Manda Wilson; Aaron Gabow; Debora S Marks; Chris Sander
Journal: Nucleic Acids Res Date: 2007-12-23 Impact factor: 16.971