Literature DB >> 27899625

YM500v3: a database for small RNA sequencing in human cancer research.

I-Fang Chung1, Shing-Jyh Chang2, Chen-Yang Chen1, Shu-Hsuan Liu3,4, Chia-Yang Li5,6, Chia-Hao Chan2, Chuan-Chi Shih2, Wei-Chung Cheng7,4.   

Abstract

We previously presented the YM500 database, which contains >8000 small RNA sequencing (smRNA-seq) data sets and integrated analysis results for various cancer miRNome studies. In the updated YM500v3 database (http://ngs.ym.edu.tw/ym500/) presented herein, we not only focus on miRNAs but also on other functional small non-coding RNAs (sncRNAs), such as PIWI-interacting RNAs (piRNAs), tRNA-derived fragments (tRFs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). There is growing knowledge of the role of sncRNAs in gene regulation and tumorigenesis. We have also incorporated >10 000 cancer-related RNA-seq and >3000 more smRNA-seq data sets into the YM500v3 database. Furthermore, there are two main new sections, 'Survival' and 'Cancer', in this updated version. The 'Survival' section provides the survival analysis results in all cancer types or in a user-defined group of samples for a specific sncRNA. The 'Cancer' section provides the results of differential expression analyses, miRNA-gene interactions and cancer miRNA-related pathways. In the 'Expression' section, sncRNA expression profiles across cancer and sample types are newly provided. Cancer-related sncRNAs hold potential for both biotech applications and basic research.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27899625      PMCID: PMC5210564          DOI: 10.1093/nar/gkw1084

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Since next generation sequencing (NGS) has become the norm for large-scale genomics research (e.g. The Cancer Genome Atlas, TCGA), small RNA sequencing (smRNA-seq) has shed light on the variations in the expression of small non-coding RNAs (sncRNAs) among different developmental stages and disease states (1). Although the use of smRNA-seq was popularized in genomics studies, most such research has primarily focused on miRNAs, which represent only a subset of all small RNA species. However, the functionality of other sncRNAs, such as PIWI-interacting RNAs (piRNAs), tRNA-derived fragments (tRFs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs), remain an important topic. Increasing evidence has shown that these non-miRNA sncRNAs also play significant roles in regulating cellular processes, such that their dysfunction would consequently contribute to cancer progression (2). Hence, the investigation of dysregulation of other classes of sncRNAs in the context of cancer, as well as of their therapeutic and diagnostic values, is of great importance. For example, a growing number of studies have reported that aberrant piRNA expression is a signature marker across distinct tumor types (3) and that snoRNAs act as oncogenes in tumorigenesis (4–6). The integration of large-scale smRNA-seq data helps researchers study the roles of these functional sncRNAs in cancer progression, but questions remain concerning the optimal methodologies for analysis, translation and utilization of such massive amounts of data (7). The role of miRNA in cancer progression has been well-investigated in the past decade (8–11). miRNAs can affect gene expression not only by suppressing protein translation but also by reducing the mRNA expression of a target gene, resulting in a correlation between the expression levels of miRNAs and their target genes (12–14). Consequently, the expression relationships between miRNAs and genes are often used to predict miRNA–gene interactions (15–17). Therefore, integrating miRNA and mRNA expression data across different cancer types is another approach to providing a global miRNA–gene interactions, including cancer-specific and cancer-wide miRNA–gene regulatory networks. For instance, Meng et al. utilized the expressions of miRNAs and mRNA in TGCA to identify miRNA–target interactions (18). Many miRNA markers have been proposed to be predictive of patient prognoses and clinical responses and are being investigated in clinical trials (8). An important step that researchers must take prior to proposing miRNA-based biomarkers for clinical validation is their evaluation in independent patient cohorts, and several web tools, such as SurvMicro (19) and PROGmiR (20), have been developed to help researchers link miRNA expression with cancer outcomes. Previously, we developed the YM500 database (21,22), a database that contains more than 8000 cancer-related smRNA data sets and includes analysis pipelines for novel miRNA prediction, arm switching discovery, isomiR identification and miRNA quantification from smRNA-seq. The previous version of this database focused only on miRNAs. For the updated version of the database, YM500v3, presented in this study, we also examined other functional sncRNAs in smRNA-seq data sets and incorporated >10 000 cancer-related RNA seq data sets and >3000 more smRNA-seq data sets from TCGA. Moreover, two major new sections, ‘Survival’ and ‘Cancer', are provided in the YM500v3 database. The ‘Survival’ section provides the survival analysis results for all cancer types or a customer-defined group of samples for a specific sncRNA. The ‘Cancer’ section provides results regarding the differential expressions of sncRNAs and genes, miRNA–gene regulated networks and cancer miRNA-related pathways.

DATA COLLECTION AND SMALL RNA ANNOTATION

The new smRNA-seq and RNA-seq data sets and clinical data in TCGA were downloaded from CGHub (https://cghub.ucsc.edu/) and pre-processed as described in our previous studies (22–24). In brief, all sequencing data were pre-processed by in-house scripts. The clinical data for each individual was manually curated based on the common data element format, the standard elements of which are used in TCGA. The annotations of miRNA and other sncRNA, such as piRNA, snRNA, snoRNA and tRFs, are based on miRBase database R21 (25) and DASHR database v1.0 (26), respectively. The DASHR database contains 7641 sncRNA gene records and 9703 annotated mature sncRNA product records. Supplementary Table S1 shows the detailed information of sncRNAs in YM500v3.

DIFFERENTIAL EXPRESSION AND miRNA-TARGET INTERACTIONS

For differential expression analysis, we utilized an R/Bioconductor package, DESeq (27,28) to identify differentially expressed miRNAs, other non-miRNA sncRNA and genes. The miRNA-target interactions in the YM500v3 database can be grouped into three types, including ‘Validated’, ‘Predicated’ and ‘Without any evidence’. The ‘Validated’ interactions are based on the information from miRTarBase database Release 6.1 (29), which contains >366 000 interactions. The predicted miRNA targets were identified by 12 miRNA target prediction tools, including DIANA-microT (30), MicroT4 (31), miRBridge (32), miRDB (33), miRMap (34), PITA (35), RNAhybrid (36), TargetScan (37), PICTAR2 (38), RNA22 (39), miRWalk (40) and miRanda (41). Only the targets that were identified by at least six tools were retained to improve the reliability of the prediction results. In the YM500v3 database, for a specific cancer type, only the differentially expressed miRNAs and genes, as identified by DESeq with q < 0.05 and fold change > 2, would be further calculated for the Pearson, Spearman and Kendall correlations for each miRNA–gene pair. The maximum absolute correlation coefficient, max(|R|) and the minimum P-value of the three correlation tests were also calculated for further filtration.

WEB INTERFACE

Expression

This section now contains not only miRNAs but also other functional sncRNA annotated in the DASHR database. Several statistical charts are added to the ‘Expression’ section to help researchers realize the expression profile of a given sncRNA across distinct cancer types. For example, the expression profiles of the miRNA and piRNA across different cancers by sample types are illustrated by boxplots in Figure 1A and B, respectively. Supplementary Figure S1A and B indicate the log2 ratio (tumors compared to adjacent normal tissues) distribution across cancer types and the expression boxplot by sample types for each cancer type, respectively. Moreover, a given sncRNA may have different IDs in different sources. As such, we also provide a sequence search function in the new database to overcome any inconsistencies in the IDs used by different sources.
Figure 1.

The ‘Expression’ section. The exemplified expression boxplots of the (A) miRNA and the (B) piRNA across distinct cancers by sample types.

The ‘Expression’ section. The exemplified expression boxplots of the (A) miRNA and the (B) piRNA across distinct cancers by sample types.

Survival

This new section has two features: ‘All Cancer Types’ and ‘Specific Sample Group’. ‘All Cancer Type’ displays the survival analysis of a specific sncRNA (either miRNAs or other sncRNAs) in all different cancer types (Figure 2A), including a summary table for all the cancers and a Kaplan–Meier plot for each individual cancer type. In addition, we also provide two menu bars to control the stratification method, such as ‘mean’ and ‘median’, and the follow-up time and to display the results immediately. The default setting uses the median expression value to divide the patients into two groups in addition to using the entire follow-up time. ‘Specific Sample Group’ helps researchers define a subgroup of samples in a single cancer type, such as triple negative breast cancer, to perform survival analysis according to dozens of clinical characteristics. Figure 2 shows that the high expression of hsa-miR-497-5p is related to good prognosis in triple negative breast cancer (Figure 2B) but it does not significantly correlate with good prognosis in all breast cancer patients (Figure 2A).
Figure 2.

Two features of the ‘Section’ section. (A) ‘All cancer types’ contains a summary table for all the cancers and a Kaplan–Meier plot for each individual cancer type. (B) ‘Specific sample group’ helps investigators define a subgroup of patients in a cancer type and provide a Kaplan–Meier plot for the subgroup. Both of the two features contains two menu bars to control the stratification method and the follow-up time.

Two features of the ‘Section’ section. (A) ‘All cancer types’ contains a summary table for all the cancers and a Kaplan–Meier plot for each individual cancer type. (B) ‘Specific sample group’ helps investigators define a subgroup of patients in a cancer type and provide a Kaplan–Meier plot for the subgroup. Both of the two features contains two menu bars to control the stratification method and the follow-up time.

Cancer

The ‘Cancer’ section stores the calculated results of differential expression analyses, miRNA–gene interactions and cancer miRNA-related pathways for a specific cancer type that contains the smRNA-seq and RNA-seq data of normal and tumor tissues for the same individuals. Figure 3 shows the results of uterine corpus endometrial carcinoma in TCGA for 33 adjacent normal and 513 primary tumor tissues (Figure 3A). There are 175 miRNAs (Figure 3B), 170 other sncRNAs (Figure 3C) and 3148 genes (Figure 3D) differentially expressed between normal and tumor tissues. The correlations of each miRNA–gene pair between the differentially expressed miRNAs and genes were calculated and divided into three groups, namely, ‘Validated’, ‘Predicated’ and ‘Without any evidence’ (Figure 3E). In order to illustrate the many-to-many relationships between miRNA–gene interactions (Figure 3F), the Cytoscape Web (48) tool is embedded for interactive network visualization. The genes that interacted with miRNAs were further functionally analyzed to address the cancer miRNA-related pathways (Figure 3G). Detailed information regarding the functional enrichment analysis method was presented in our previous studies (24). Two menu bars are also provided to control the criteria, the max(|R|) and the number of prediction tools used in order to display the corresponding results.
Figure 3.

The ‘Cancer’ section. This section stores the calculated results by (A) cancer types that contains the results of differential expression analysis, including (B) miRNAs, (C) non-miR sncRNAs, (D) mRNAs. The correlations of each miRNA–gene pair were calculated and divided into three groups, namely, (E) ‘Validated’, ‘Predicated’ and ‘Without any evidence’, as well as displayed by an (F) interactive network visualization. (G) The cancer miRNA-related pathways were identified by the miRNA-interacted genes through functional enrichment analysis. The another feature, ‘Specific miRNA-gene pairs', help researchers examine the interactions between miRNAs and genes by (H) user-defined criteria and then the (I) miRNA–gene pairs are displayed immediately. The width of the line in (I) indicates the number of records.

The ‘Cancer’ section. This section stores the calculated results by (A) cancer types that contains the results of differential expression analysis, including (B) miRNAs, (C) non-miR sncRNAs, (D) mRNAs. The correlations of each miRNA–gene pair were calculated and divided into three groups, namely, (E) ‘Validated’, ‘Predicated’ and ‘Without any evidence’, as well as displayed by an (F) interactive network visualization. (G) The cancer miRNA-related pathways were identified by the miRNA-interacted genes through functional enrichment analysis. The another feature, ‘Specific miRNA-gene pairs', help researchers examine the interactions between miRNAs and genes by (H) user-defined criteria and then the (I) miRNA–gene pairs are displayed immediately. The width of the line in (I) indicates the number of records. We also provide another feature, ‘Specific miRNA-gene pairs’, in the ‘Cancer’ section in order to help researchers examine the interactions between miRNAs and genes by user-defined criteria. Researchers can enter multiple miRNAs and/or genes, and can also define the interactions according to max(|R|), minimum P-value, the number of prediction tools and the validated information for the miRNA-gene pairs (Figure 3H). After a query is submitted, the miRNA–gene pairs identified according to the user-defined criteria are then displayed immediately (Figure 3I). For the interactions supported by multiple cancer types, the width of the line indicates the number of records.

DISCUSSION

The library construction in smRNA-seq selects RNAs by their lengths rather than their types. The libraries obtained for smRNA-seq contain a variety of species of sncRNAs, indicating that miRNAs represent only a subset of the species obtained by size selection. Although miRNAs are only one of the many sncRNA species in smRNA-seq data sets, miRNAs remain the most popular class to study, largely because their biogenesis is relatively well understood and because the regulatory mechanism in post-transcription is known (42). However, more and more evidence shows that other non-miRNA sncRNAs also play important roles in gene regulation and certain diseases, such as cancers (5,7,43–45). For instance, there is an increasing amount of knowledge regarding the role of snoRNAs in cancer progression, and the information obtained thus far suggests that snoRNAs hold considerable potential for use as novel biomarkers and therapeutic targets in cancer treatment (4–6). It has also been reported that tRFs exhibit features of functional regulatory molecules (46–48), and they have a relatively well described role in disease and infection (49–51). Unfortunately, many researchers ignore the numerous non-miRNA sncRNA species present in the smRNA-seq data. A common barrier is often the lack of genomic annotations for these non-miRNA species. In this updated version of the YM500 database, however, we not only focus on miRNAs in smRNA-seq but also on other non-miRNA sncRNA according to the well-annotated sncRNA database, DASHR. Several functions in the updated database, including ‘Expression’, ‘Survival’ and ‘Cancer’, can assist researchers in investigating sncRNAs. The concept behind precision medicine is intuitive: individual patients are better modeled by a subgroup of patients, rather than a larger, more general population of patients (52,53). In seeking to adhere to this concept, the ‘Survival’ and ‘Meta-analysis’ sections provide functions to help investigators define specific sample groups according to dozens of clinical characteristics. In the ‘Survival’ section, this concept has been exemplified by hsa-miR-497-5p that has been reported as a ‘protective’ miRNA in triple negative breast cancer (54). Our analysis shows that the high expression of hsa-miR-497-5p is significantly related to good prognosis in triple negative breast cancer but its expression does not correlate with prognosis in all breast cancer patients (Figure 2). Furthermore, the ‘Meta-analysis’ section contains the same types of results in the ‘Cancer’ section, including differential expression analyses, miRNA–gene interaction and miRNA-related pathway results, but the results in the ‘Meta-analysis’ section are based on the two customer-defined groups. The ‘Cancer’ section only stores the calculated results based on the two groups, the adjacent normal and primary tumor tissues in the same cancer type. For example, if a miRNA–gene interaction only exists in some specific sample groups, it cannot be found in the ‘Cancer’ section but might be identified in the ‘Meta-analysis’ section. Moreover, the ‘Specific miRNA–gene interactions’ function in the ‘Cancer’ section helps researchers investigate specific interactions according to a list of criteria that they themselves have defined, with the width of lines in the interactive network indicating the confidence. It is currently a golden era in the field of genomics. Due to the rapidly decreasing costs of sequencing, the obstacles to performing genomic-scale NGS do not lie in the area of data generation, but rather are obstacles affecting data analysis and storage (42). Although researchers in the field of genomics are certainly aware of the mining of novel sncRNAs, many investigators currently choose not to fully analyze the sncRNAs in their smRNA-seq. Nonetheless, there are still many complexities to be discovered in the sncRNA transcriptome. To achieve this goal, we will continue to update the smRNA-seq data sets and sncRNA annotations to provide a comprehensive overview of up-to-date sncRNAs in cancer research.
  54 in total

1.  miRWalk2.0: a comprehensive atlas of microRNA-target interactions.

Authors:  Harsh Dweep; Norbert Gretz
Journal:  Nat Methods       Date:  2015-08       Impact factor: 28.547

Review 2.  MicroRNAs in cancer: biomarkers, functions and therapy.

Authors:  Josie Hayes; Pier Paolo Peruzzi; Sean Lawler
Journal:  Trends Mol Med       Date:  2014-07-12       Impact factor: 11.951

Review 3.  Small nucleolar RNAs functioning and potential roles in cancer.

Authors:  Nithyananda Thorenoor; Ondrej Slaby
Journal:  Tumour Biol       Date:  2014-11-25

4.  Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs.

Authors:  Lee P Lim; Nelson C Lau; Philip Garrett-Engele; Andrew Grimson; Janell M Schelter; John Castle; David P Bartel; Peter S Linsley; Jason M Johnson
Journal:  Nature       Date:  2005-01-30       Impact factor: 49.962

5.  MicroRNA expression profiles classify human cancers.

Authors:  Jun Lu; Gad Getz; Eric A Miska; Ezequiel Alvarez-Saavedra; Justin Lamb; David Peck; Alejandro Sweet-Cordero; Benjamin L Ebert; Raymond H Mak; Adolfo A Ferrando; James R Downing; Tyler Jacks; H Robert Horvitz; Todd R Golub
Journal:  Nature       Date:  2005-06-09       Impact factor: 49.962

6.  miRDB: an online resource for microRNA target prediction and functional annotations.

Authors:  Nathan Wong; Xiaowei Wang
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 16.971

7.  Dietary components as epigenetic-regulating agents against cancer.

Authors:  Ling-Chu Chang; Yung-Luen Yu
Journal:  Biomedicine (Taipei)       Date:  2016-02-10

8.  CancerNet: a database for decoding multilevel molecular interactions across diverse cancer types.

Authors:  X Meng; J Wang; C Yuan; X Li; Y Zhou; R Hofestädt; M Chen
Journal:  Oncogenesis       Date:  2015-12-21       Impact factor: 7.485

9.  DriverDB: an exome sequencing database for cancer driver gene identification.

Authors:  Wei-Chung Cheng; I-Fang Chung; Chen-Yang Chen; Hsing-Jen Sun; Jun-Jeng Fen; Wei-Chun Tang; Ting-Yu Chang; Tai-Tong Wong; Hsei-Wei Wang
Journal:  Nucleic Acids Res       Date:  2013-11-07       Impact factor: 16.971

10.  DASHR: database of small human noncoding RNAs.

Authors:  Yuk Yee Leung; Pavel P Kuksa; Alexandre Amlie-Wolf; Otto Valladares; Lyle H Ungar; Sampath Kannan; Brian D Gregory; Li-San Wang
Journal:  Nucleic Acids Res       Date:  2015-11-08       Impact factor: 16.971

View more
  24 in total

1.  Mitochondrial PIWI-interacting RNAs are novel biomarkers for clear cell renal cell carcinoma.

Authors:  Chenming Zhao; Yuri Tolkach; Doris Schmidt; Marieta Toma; Michael H Muders; Glen Kristiansen; Stefan C Müller; Jörg Ellinger
Journal:  World J Urol       Date:  2018-11-28       Impact factor: 4.226

2.  Identification of tRNA-derived ncRNAs in TCGA and NCI-60 panel cell lines and development of the public database tRFexplorer.

Authors:  Alessandro La Ferlita; Salvatore Alaimo; Dario Veneziano; Giovanni Nigita; Veronica Balatti; Carlo M Croce; Alfredo Ferro; Alfredo Pulvirenti
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

3.  SEAweb: the small RNA Expression Atlas web application.

Authors:  Raza-Ur Rahman; Anna-Maria Liebhoff; Vikas Bansal; Maksims Fiosins; Ashish Rajput; Abdul Sattar; Daniel S Magruder; Sumit Madan; Ting Sun; Abhivyakti Gautam; Sven Heins; Timur Liwinski; Jörn Bethune; Claudia Trenkwalder; Juliane Fluck; Brit Mollenhauer; Stefan Bonn
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

4.  HIV-infection and cocaine use regulate semen extracellular vesicles proteome and miRNAome in a manner that mediates strategic monocyte haptotaxis governed by miR-128 network.

Authors:  Hussein Kaddour; Steven Kopcho; Yuan Lyu; Nadia Shouman; Victor Paromov; Siddharth Pratap; Chandravanu Dash; Eun-Young Kim; Jeremy Martinson; Heather McKay; Marta Epeldegui; Joseph B Margolick; Jack T Stapleton; Chioma M Okeoma
Journal:  Cell Mol Life Sci       Date:  2021-12-22       Impact factor: 9.207

5.  miR19b-3p promotes the growth and metastasis of colorectal cancer via directly targeting ITGB8.

Authors:  Liang Huang; Jin Lin Cai; Pin Zhu Huang; Liang Kang; Mei Jin Huang; Lei Wang; Jian Ping Wang
Journal:  Am J Cancer Res       Date:  2017-10-01       Impact factor: 6.166

6.  Dysregulation of hsa-miR-34a and hsa-miR-449a leads to overexpression of PACS-1 and loss of DNA damage response (DDR) in cervical cancer.

Authors:  Mysore S Veena; Santanu Raychaudhuri; Saroj K Basak; Natarajan Venkatesan; Parameet Kumar; Roopa Biswas; Rita Chakrabarti; Jing Lu; Trent Su; Marcus Gallagher-Jones; Marco Morselli; Haiqing Fu; Matteo Pellegrini; Theodore Goldstein; Mirit I Aladjem; Matthew B Rettig; Sharon P Wilczynski; Daniel Sanghoon Shin; Eri S Srivatsan
Journal:  J Biol Chem       Date:  2020-10-07       Impact factor: 5.157

7.  miR-4286 is Involved in Connections Between IGF-1 and TGF-β Signaling for the Mesenchymal Transition and Invasion by Glioblastomas.

Authors:  Kuo-Hao Ho; Peng-Hsu Chen; Chwen-Ming Shih; Yi-Ting Lee; Chia-Hsiung Cheng; Ann-Jeng Liu; Chin-Cheng Lee; Ku-Chung Chen
Journal:  Cell Mol Neurobiol       Date:  2020-10-06       Impact factor: 5.046

8.  ADAM9 enhances CDCP1 by inhibiting miR-1 through EGFR signaling activation in lung cancer metastasis.

Authors:  Kuo-Liang Chiu; Yu-Sen Lin; Ting-Ting Kuo; Chia-Chien Lo; Yu-Kai Huang; Hsien-Fang Chang; Eric Y Chuang; Ching-Chan Lin; Wei-Chung Cheng; Yen-Nien Liu; Liang-Chuan Lai; Yuh-Pyng Sher
Journal:  Oncotarget       Date:  2017-07-18

9.  MiR-142-3p is downregulated in aggressive p53 mutant mouse models of pancreatic ductal adenocarcinoma by hypermethylation of its locus.

Authors:  Jack D Godfrey; Jennifer P Morton; Ania Wilczynska; Owen J Sansom; Martin D Bushell
Journal:  Cell Death Dis       Date:  2018-05-29       Impact factor: 8.469

10.  Systematic identification of clinically relevant miRNAs for potential miRNA-based therapy in lung adenocarcinoma.

Authors:  Shu-Hsuan Liu; Kai-Wen Hsu; Yo-Liang Lai; Yu-Feng Lin; Fang-Hsin Chen; Pei-Hwa Peng; Li-Jie Lin; Heng-Hsiung Wu; Chia-Yang Li; Shu-Chi Wang; Min-Zu Wu; Yuh-Pyng Sher; Wei-Chung Cheng
Journal:  Mol Ther Nucleic Acids       Date:  2021-05-01       Impact factor: 8.886

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.